Multiplexed CHO single cell RNA-seq

Clarke Lab. NIBRT

2022-03-30

Overview

This R notebook enables the reproduction of the analysis described in:

Tzani et al. Understanding the transcriptional response to ER stress in Chinese hamster ovary cells using multiplexed single cell RNA-seq (2022) bioRxiv

In this study, we report the use of oligonucleotide barcoding to perform multiplexed CHO cell scRNA-seq to study the impact of tunicamycin (TM), an inducer of the unfolded protein response (UPR). For this experiment, we treated a CHO-K1 GS cell line with 10μg/ml tunicamycin and acquired samples at 1, 2, 4 and 8 hr post-treatment as well as a non-treated TM- control. We transfected cells from each sample with a distinct polyadenylated ssDNA oligonucleotide barcode before pooling all samples for scRNA-seq.

Raw Data Availibility

The FASTQ files for this study are available from the NCBI Sequence Read Archive ID: PRJNA821033

The associated github repository can be found here

Data Analysis

Prepare

#load packages
package_list <- c(
  "DropletUtils", "Matrix", "tidyverse", "readxl", "writexl", "scales", 
  "ggridges", "WGCNA", "scWGCNA", "DT", "Seurat", "viridis", "WebGestaltR", 
  "ggpubr", "patchwork", "scater","ggExtra", "SeuratWrappers", "langevitour", 
  "biomaRt","RColorBrewer", "plyr"
)
invisible(lapply(package_list, require, character.only = TRUE))
## Loading required package: DropletUtils
## Loading required package: SingleCellExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following object is masked from 'package:S4Vectors':
## 
##     expand
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::collapse()   masks IRanges::collapse()
## x dplyr::combine()    masks Biobase::combine(), BiocGenerics::combine()
## x dplyr::count()      masks matrixStats::count()
## x dplyr::desc()       masks IRanges::desc()
## x tidyr::expand()     masks Matrix::expand(), S4Vectors::expand()
## x dplyr::filter()     masks stats::filter()
## x dplyr::first()      masks S4Vectors::first()
## x dplyr::lag()        masks stats::lag()
## x tidyr::pack()       masks Matrix::pack()
## x ggplot2::Position() masks BiocGenerics::Position(), base::Position()
## x purrr::reduce()     masks GenomicRanges::reduce(), IRanges::reduce()
## x dplyr::rename()     masks S4Vectors::rename()
## x dplyr::slice()      masks IRanges::slice()
## x tidyr::unpack()     masks Matrix::unpack()
## Loading required package: readxl
## Loading required package: writexl
## Loading required package: scales
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
## Loading required package: ggridges
## Loading required package: WGCNA
## Loading required package: dynamicTreeCut
## Loading required package: fastcluster
## 
## Attaching package: 'fastcluster'
## The following object is masked from 'package:stats':
## 
##     hclust
## 
## 
## Attaching package: 'WGCNA'
## The following object is masked from 'package:IRanges':
## 
##     cor
## The following object is masked from 'package:S4Vectors':
## 
##     cor
## The following object is masked from 'package:stats':
## 
##     cor
## Loading required package: scWGCNA
## Loading required package: DT
## Loading required package: Seurat
## Registered S3 method overwritten by 'spatstat.geom':
##   method     from
##   print.boxx cli
## Attaching SeuratObject
## 
## Attaching package: 'Seurat'
## The following object is masked from 'package:DT':
## 
##     JS
## The following object is masked from 'package:SummarizedExperiment':
## 
##     Assays
## Loading required package: viridis
## Loading required package: viridisLite
## 
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
## 
##     viridis_pal
## Loading required package: WebGestaltR
## ******************************************
## *                                        *
## *          Welcome to WebGestaltR !      *
## *                                        *
## ******************************************
## Loading required package: ggpubr
## Loading required package: patchwork
## Loading required package: scater
## Loading required package: scuttle
## Loading required package: ggExtra
## Loading required package: SeuratWrappers
## Loading required package: langevitour
## Loading required package: biomaRt
## Loading required package: RColorBrewer
## Loading required package: plyr
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## The following object is masked from 'package:ggpubr':
## 
##     mutate
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following object is masked from 'package:purrr':
## 
##     compact
## The following object is masked from 'package:IRanges':
## 
##     desc
## The following object is masked from 'package:S4Vectors':
## 
##     rename
## The following object is masked from 'package:matrixStats':
## 
##     count
set.seed(38)

# create a directory for the output
results_dir <- "../results/"
if (!dir.exists(results_dir)) {
  dir.create(results_dir, recursive = TRUE)
}

qPCR analysis of gene expression

Prior to conducting single cell analysis, qPCR was carried was to confirm that the expression of 3 genes and a spliced isoform of XBP1 were upregulated following tunicamycin addition. Here, we plot the fold changes for each treatment timepoint against the untreated control.

qpcr_data <- read_excel("../data/qpcr_analysis.xlsx")

qpcr_data <- qpcr_data %>%
  pivot_longer(cols = c("Hspa5", "Ddit3", "Atf4", "sXbp1")) %>%
  mutate(fold_change = signif(value, 2))

# define color pa
col_pal_vidiris <- c(
  "#fde725", "#5ec962", "#21918c",
  "#31688e", "#440154"
)

qpcr_data %>%
  ggboxplot(
    x = "Condition", y = "fold_change",
    fill = "Condition", palette = col_pal_vidiris
  ) +
  geom_point(
    data = qpcr_data, aes(x = Condition, y = fold_change),
    position = position_jitter(width = 0),
    size = 0.2, color = "grey"
  ) +
  labs(x = "", y = "Fold change") +
  facet_wrap(~name, scales = "free_y", nrow = 2) +
  theme_bw() +
  theme(
    legend.position = "none",
    strip.text.x = element_text(face = "italic"),
    strip.background = element_blank(),
    panel.spacing = unit(1, "lines"),
    axis.text.x = element_text(angle = 35, vjust = 1, hjust = 1)
  ) +
  scale_y_continuous(breaks = pretty_breaks(n = 4, min.n = 0))
The upregulation of Atf4, Ddit3, Hspa5 and sXbp1 in comparison to the non-TM treated control sample observed via qPCR, confirmed the induction of the UPR

The upregulation of Atf4, Ddit3, Hspa5 and sXbp1 in comparison to the non-TM treated control sample observed via qPCR, confirmed the induction of the UPR

ggsave(
  filename = paste(results_dir, "Figure 1A.png", sep = ""),
  width = 4, height = 4, device = "png", dpi = 700
)

Preprocess cellular RNA libraries

In this section we preprocess the counts for the CHO cell scRNA-seq data. We first remove empty droplets and then filter cells barcodes to remove data that may arise from damaged cells.

Load the cellular RNA data from GEM well 1 and GEM well 2. These data have already undergone filtering for empty droplets.

# GEM well 1 
bc_rank1 <- readRDS("../data/rds_files/library1_bc_rank.rds")
cell_library_1 <- readRDS("../data/rds_files/cellular_rna_lib1.rds")

# GEM well 1
bc_rank2 <- readRDS("../data/rds_files/library2_bc_rank.rds")
cell_library_2 <- readRDS("../data/rds_files/cellular_rna_lib2.rds")

Show the knee plots illustrating filtering of empty droplets. Function was taken from from the Kallisto | Bustools tutorial

# function to construct the plot
knee_plot <- function(bc_rank) {
  knee_plt <- tibble(
    rank = bc_rank[["rank"]],
    total = bc_rank[["total"]]
  ) %>%
    distinct() %>%
    dplyr::filter(total > 0)
  annot <- tibble(
    inflection = metadata(bc_rank)[["inflection"]],
    rank_cutoff = max(bc_rank$rank[bc_rank$total > metadata(bc_rank)[["inflection"]]])
  )
  
  p <- ggplot(knee_plt, aes(total, rank)) +
    geom_line() +
    geom_hline(aes(yintercept = rank_cutoff), data = annot, linetype = 2) +
    geom_vline(aes(xintercept = inflection), data = annot, linetype = 2) +
    scale_x_log10() +
    scale_y_log10() +
    annotation_logticks() +
    labs(y = "Rank", x = "Total UMIs")
  return(p)
}

# plot
knee_plot_lib1 <- knee_plot(bc_rank1) + 
  labs(title = "Cellular RNA library 1") + theme_bw()

knee_plot_lib2 <- knee_plot(bc_rank2) + 
  labs(title = "Cellular RNA library 2") + theme_bw()

# join and tag
knee_plot_lib1 + knee_plot_lib2 + plot_annotation(tag_levels = "a")
Knee plots illustrating the removal of empty droplets for the cellular RNA data for (a) GEM well 1 and (b) GEM well 2 based on ranking the total UMI for counts per cellular barcode

Knee plots illustrating the removal of empty droplets for the cellular RNA data for (a) GEM well 1 and (b) GEM well 2 based on ranking the total UMI for counts per cellular barcode

# save
ggsave(
  filename = paste(results_dir, "Figure S2.png", sep = ""),
  width = 8, height = 4, device = "png", dpi = 700
)

Next, we create a function to filter cells from both libraries based on the number of UMIs mapping to mtDNA as well as the number of genes and UMIs detected.

filter_cells <- function(object) {
  
  # run miQC
  object <- RunMiQC(object,
    percent.mt = "percent.mt",
    nFeature_RNA = "nFeature_RNA",
    posterior.cutoff = 0.9,
    model.slot = "flexmix_model"
  )
  
  # capture the plot
  mtdna_plot <- PlotMiQC(object, color.by = "miQC.keep")
  
  # remove poor quality cells
  object <- subset(object, subset = miQC.keep == "keep")
  
  # modify miQC output plot
  mtdna_plot$layers[c(1, 2, 3)] <- NULL
  mtdna_plot <- mtdna_plot + theme_bw() +
    scale_color_manual(values = c("#888888", "#117733")) +
    # scale_color_brewer(palette = "Dark2") +
    geom_point(size = 0.2, alpha = 0.5) +
    labs(x = "# Genes", y = "% mtDNA UMIs", color = "") +
    guides(colour = guide_legend(override.aes = list(size = 5))) +
    annotate("text", -Inf, Inf,
      hjust = -1.70, vjust = 1.5, color = "#117733",
      label = paste0(dim(object)[2], " cells retained")
    ) +
    theme(
      panel.grid.major = element_blank(),
      panel.grid.minor = element_blank(),
      legend.position = "bottom"
    )

  # thresholds for nUMI
  log_umi_counts <- log10(object@meta.data$nCount_RNA)
  low_count_threshold <- median(log_umi_counts) - (3 * mad(log_umi_counts))
  high_count_threshold <- median(log_umi_counts) + (3 * mad(log_umi_counts))

  # thresholds for detected genes
  log_detect_genes <- log10(object@meta.data$nFeature_RNA)
  low_feature_threshold <- median(log_detect_genes) - (3 * mad(log_detect_genes))
  high_feature_threshold <- median(log_detect_genes) + (3 * mad(log_detect_genes))

  # plot data for UMI and detected genes results
  umi_v_feature_plot_data <- object@meta.data %>%
    mutate(miQC.keep = case_when(
      nFeature_RNA > 10^low_feature_threshold &
        nFeature_RNA < 10^high_feature_threshold &
        nCount_RNA > 10^low_count_threshold &
        nCount_RNA < 10^high_count_threshold ~ "keep",
      TRUE ~ "discard"
    ))
  
  # filter cells
  object <- subset(
    object,
    nFeature_RNA > 10^low_feature_threshold &
      nFeature_RNA < 10^high_feature_threshold &
      nCount_RNA > 10^low_count_threshold &
      nCount_RNA < 10^high_count_threshold
  )

  # plot UMI and detected genes results
  umi_v_feature_plot <- umi_v_feature_plot_data %>%
    ggplot(aes(x = log10(nCount_RNA), y = log10(nFeature_RNA), color = miQC.keep)) +
    geom_point(size = 0.2, alpha = 0.5) +
    scale_color_manual(values = c("#888888", "#117733")) +
    # geom_smooth(method = "lm") +
    geom_hline(aes(yintercept = low_feature_threshold), colour = "grey", linetype = 2) +
    geom_hline(aes(yintercept = high_feature_threshold), colour = "grey", linetype = 2) +
    geom_vline(aes(xintercept = high_count_threshold), colour = "grey", linetype = 2) +
    geom_vline(aes(xintercept = low_count_threshold), colour = "grey", linetype = 2) +
    theme_bw() +
    labs(x = "Log10 # UMIs", y = "Log10 # Genes", color = "") +
    guides(colour = guide_legend(override.aes = list(size = 5))) +
    annotate("text", -Inf, Inf,
      hjust = -0.1, vjust = 1.5, color = "#117733",
      label = paste0(dim(object)[2], " cells retained")
    ) +
    theme(
      panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
      legend.position = "bottom"
    )


  # show the filtered cell summary
  print(object)

  # results list
  result_list <- list(
    "mtdna_plot" = mtdna_plot,
    "filtered_object" = object,
    "umi_feature_plot" = umi_v_feature_plot,
    "umi_feature_plot_data" = umi_v_feature_plot_data
  )

  return(result_list)
}

Specify the mitochondrial protein coding genes from the CHO-K1 genome

mito <- c("ENSCGRG00000000006.1","ENSCGRG00000000010.1","ENSCGRG00000000016.1",
          "ENSCGRG00000000019.1","ENSCGRG00000000021.1","ENSCGRG00000000022.1",
          "ENSCGRG00000000023.1","ENSCGRG00000000025.1","ENSCGRG00000000027.1",
          "ENSCGRG00000000028.1","ENSCGRG00000000032.1","ENSCGRG00000000033.1",
          "ENSCGRG00000000035.1")

Preprocess GEM well 1 RNA data.

# determine the percentage mapping to mtDNA
cell_library_1[["percent.mt"]] <- PercentageFeatureSet(cell_library_1, 
                                                       features = mito)
# preprocess
cell_rna_library_1_filter_output <- filter_cells(cell_library_1)
## Warning: Adding a command log without an assay associated with it
## An object of class Seurat 
## 15502 features across 3201 samples within 1 assay 
## Active assay: RNA (15502 features, 0 variable features)
cell_library_1 <- cell_rna_library_1_filter_output$filtered_object

Preprocess GEM well 2 RNA data..

# determine the percentage mapping to mtDNA
cell_library_2[["percent.mt"]] <- PercentageFeatureSet(cell_library_2, 
                                                       features = mito)
# preprocess
cell_rna_library_2_filter_output <- filter_cells(cell_library_2)
## An object of class Seurat 
## 15493 features across 3381 samples within 1 assay 
## Active assay: RNA (15493 features, 0 variable features)
cell_library_2 <- cell_rna_library_2_filter_output$filtered_object

Plot the results of cellular RNA preprocessing for GEM well 1 and 2

(
  (cell_rna_library_1_filter_output$mtdna_plot + plot_spacer() + 
    cell_rna_library_1_filter_output$umi_feature_plot + 
    plot_layout(widths = c(2,0.1,2))
   ) 
  / 
(
  cell_rna_library_2_filter_output$mtdna_plot + plot_spacer()  + 
   cell_rna_library_2_filter_output$umi_feature_plot + 
   plot_layout(widths = c(2,0.1,2))
  )
) + 
  plot_annotation(tag_levels = "a") + plot_layout(guides = "collect") & theme(legend.position = "bottom")
miQC was used to assess the relationship between the number of UMIs originating from mtDNA and the number genes detected to identify low quality cells in the cellular RNA libraries from (a) GEM well 1 and (c) GEM well 2. If the number of UMIs or number of genes detected per cell was beyond 3 MAD of the median of the population of (b) GEM well 1 or (d) GEM well 2 those cells were also discarded.

miQC was used to assess the relationship between the number of UMIs originating from mtDNA and the number genes detected to identify low quality cells in the cellular RNA libraries from (a) GEM well 1 and (c) GEM well 2. If the number of UMIs or number of genes detected per cell was beyond 3 MAD of the median of the population of (b) GEM well 1 or (d) GEM well 2 those cells were also discarded.

ggsave(filename = paste(results_dir, "Figure S3.png", sep = ""), 
       width = 9, height = 9, device = "png", dpi = 700)

Sample classification using SBO data

First, we define a function to integrate the SBO with its cellular RNA data.

sbo_integration <- function(object, sbo_matrix, library) {

  # extract the RNA counts from the Seurat object
  seurat_count_matrix <- as.matrix(GetAssayData(
    object = object,
    slot = "counts"
  ))

  # determine the cell barcodes shared between SBO and RNA
  joint_barcodes <- intersect(
    colnames(sbo_matrix),
    colnames(seurat_count_matrix)
  )

  print(paste(length(joint_barcodes), "barcodes matched between mRNA and SBO library"))

  # create a RNA count matrix containing only the shared barcodes
  matched_count_matrix <- seurat_count_matrix[, joint_barcodes]

  # create a new Seurat object for the RNA counts
  new_object <- CreateSeuratObject(
    counts = matched_count_matrix,
    project = library
  )

  # set the percent mt for new object using the previous values
  new_object[["percent.mt"]] <- object$percent.mt[joint_barcodes]

  # add meta data for the library
  new_object@meta.data$library <- rep(library, dim(new_object)[2])

  # add a new assay to hold the SBO counts
  sbo_matrix <- as.matrix(sbo_matrix[, joint_barcodes]) # intersect

  new_object[["SBO"]] <- CreateAssayObject(counts = sbo_matrix)

  return(new_object)
}

Integrate RNA and SBO counts for GEM well 1

# load the SBO counts for GEM well 2
sbo_lib_1_data <- readRDS("../data/rds_files/sbo_lib_1.data.rds")

# integrate
cho_cell_sbo_lib_1 <- sbo_integration(cell_library_1, sbo_lib_1_data, "library 1")
## [1] "3200 barcodes matched between mRNA and SBO library"
#summary
cho_cell_sbo_lib_1
## An object of class Seurat 
## 15507 features across 3200 samples within 2 assays 
## Active assay: RNA (15502 features, 0 variable features)
##  1 other assay present: SBO
# normalise SBO counts 
cho_cell_sbo_lib_1 <- NormalizeData(cho_cell_sbo_lib_1, assay = "SBO", 
                               normalization.method = "CLR")
# load the SBO counts for GEM well 2
sbo_lib_2_data <- readRDS("../data/rds_files/sbo_lib_2.data.rds")

# integrate
cho_cell_sbo_lib_2 <- sbo_integration(cell_library_2, sbo_lib_2_data, "library 2")
## [1] "3381 barcodes matched between mRNA and SBO library"
#summary
cho_cell_sbo_lib_2
## An object of class Seurat 
## 15498 features across 3381 samples within 2 assays 
## Active assay: RNA (15493 features, 0 variable features)
##  1 other assay present: SBO
# normalise SBO counts 
cho_cell_sbo_lib_2<- NormalizeData(cho_cell_sbo_lib_2, assay = "SBO", 
                               normalization.method = "CLR")

Next, we need to determine a postive quartile threshold for SBO classification for the 2 GEM well libraries.

eval_pq <- function(object, library_id) {
  
  eval_pq_out <- tibble()

  for (i in seq(from = 0.9, to = 0.999999999, by = 0.005)) {
    object <- HTODemux(object,
      assay = "SBO",
      positive.quantile = i,
      verbose = F,
      kfunc = "kmeans", init = 6
    )

    sbo_classification <- as.data.frame.matrix(table(
      object$SBO_maxID,
      object$SBO_classification.global
    )) %>%
      summarise(max.singlet = sum(Singlet)) %>%
      mutate(threshold = i)

    eval_pq_out <- bind_rows(eval_pq_out, sbo_classification) %>%
      mutate(library = library_id)
  }

  eval_pq_plot <- eval_pq_out %>%
    ggplot(aes(x = threshold, y = cells, fill = SBO_classification.global)) +
    geom_bar(position = "fill", stat = "identity") +
    theme_bw() +
    scale_fill_viridis(discrete = T, direction = -1, option = "plasma") +
    labs(y = "# Cells", x = "HTODemux positive quantile treshold", fill = "SBO identification", title = library_id)

  result_list <- list(
    "eval_pq_out" = eval_pq_out,
    "eval_pq_plot" = eval_pq_plot
  )

  return(result_list)
}

Here we select a balenced cutoff for both GEM wells to optimise the total number of singlet cells identified.

eval_pq_lib1 <- eval_pq(cho_cell_sbo_lib_1, "GEM well 1")

eval_pq_lib2 <- eval_pq(cho_cell_sbo_lib_2, "GEM well 2")

bind_rows(eval_pq_lib1$eval_pq_out, eval_pq_lib2$eval_pq_out) %>%
  ggplot(aes(x = threshold, y = max.singlet, fill = library, color = library)) +
  geom_line() +
  geom_point() +
  geom_vline(xintercept = 0.94, linetype = 11, size = 0.5, color = "grey") +
  geom_label(aes(x = 0.94, label = "HTODemux\nPostive Quantile\n= 0.94", y = 2550), inherit.aes = F, colour = "dark grey") +
  scale_color_brewer(palette = "Dark2", direction = -1) +
  labs(x = "Postive Quantile", y = "# Singlet cells", fill = "", color = "") +
  theme_bw()
A range of positive quantile value from 0.9 to 0.999999 were assessed to identify a value for both GEM well 1 and GEM well 2 datasets. 0.94 was selected as the classification threshold to maximise the number of singlet cells.

A range of positive quantile value from 0.9 to 0.999999 were assessed to identify a value for both GEM well 1 and GEM well 2 datasets. 0.94 was selected as the classification threshold to maximise the number of singlet cells.

ggsave(filename = paste(results_dir, "Figure S4.png", sep = ""), width = 6, height = 4, device = "png", dpi = 700)

Classify GEM well 1 samples.

# classify
cho_cell_sbo_lib_1 <- HTODemux(cho_cell_sbo_lib_1,
  assay = "SBO",
  positive.quantile = 0.94, kfunc = "kmeans", init = 6
)
## Cutoff for 8-hours-treated-GTATATCC : 75 reads
## Cutoff for control-CCGGACTT : 124 reads
## Cutoff for 4-hours-treated-GTCAGGAT : 85 reads
## Cutoff for 2-hours-treated-GAACTCCC : 139 reads
## Cutoff for 1-hours-treated-GCTTGCCT : 72 reads
lib1_sbo_classification <- as.data.frame.matrix(
  table(
    cho_cell_sbo_lib_1$SBO_maxID,
    cho_cell_sbo_lib_1$SBO_classification.global
  )
) %>%
  as_tibble(rownames = "Sample") %>%
  mutate(`Sample label` = case_when(
    Sample == "control-CCGGACTT" ~ "TM-",
    Sample == "1-hours-treated-GCTTGCCT" ~ "TM+ 1hr",
    Sample == "2-hours-treated-GAACTCCC" ~ "TM+ 2hr",
    Sample == "4-hours-treated-GTCAGGAT" ~ "TM+ 4hr",
    Sample == "8-hours-treated-GTATATCC" ~ "TM+ 8hr"
  )) %>%
  dplyr::rename(`Sample ID` = Sample) %>%
  dplyr::select(
    `Sample ID`, `Sample label`,
    `Doublet`, `Negative`, `Singlet`
  ) %>%
  arrange(`Sample label`)

# output table
lib1_sbo_classification %>%
  DT::datatable()
# capture data for merged classification plot
classification_library_1 <- data.frame(table(
  cho_cell_sbo_lib_1$SBO_maxID,
  cho_cell_sbo_lib_1$SBO_classification.global
)) %>%
  mutate(library = "GEM well 1")

# remove cells with no classified SBO
rna_sbo_lib_1_subset <- subset(cho_cell_sbo_lib_1,
  idents = "Negative", invert = T
)

Classify GEM well 1 samples.

# classify
cho_cell_sbo_lib_2 <- HTODemux(cho_cell_sbo_lib_2,
  assay = "SBO",
  positive.quantile = 0.94, kfunc = "kmeans", init = 6
)
## Cutoff for 8-hours-treated-GTATATCC : 55 reads
## Cutoff for control-CCGGACTT : 90 reads
## Cutoff for 4-hours-treated-GTCAGGAT : 66 reads
## Cutoff for 2-hours-treated-GAACTCCC : 105 reads
## Cutoff for 1-hours-treated-GCTTGCCT : 49 reads
lib2_sbo_classification <- as.data.frame.matrix(
  table(
    cho_cell_sbo_lib_2$SBO_maxID,
    cho_cell_sbo_lib_2$SBO_classification.global
  )
) %>%
  as_tibble(rownames = "Sample") %>%
  mutate(`Sample label` = case_when(
    Sample == "control-CCGGACTT" ~ "TM-",
    Sample == "1-hours-treated-GCTTGCCT" ~ "TM+ 1hr",
    Sample == "2-hours-treated-GAACTCCC" ~ "TM+ 2hr",
    Sample == "4-hours-treated-GTCAGGAT" ~ "TM+ 4hr",
    Sample == "8-hours-treated-GTATATCC" ~ "TM+ 8hr"
  )) %>%
  dplyr::rename(`Sample ID` = Sample) %>%
  dplyr::select(
    `Sample ID`, `Sample label`,
    `Doublet`, `Negative`, `Singlet`
  ) %>%
  arrange(`Sample label`)

lib2_sbo_classification %>%
  DT::datatable()
# capture data for merged classification plot
classification_library_2 <- data.frame(
  table(
    cho_cell_sbo_lib_2$SBO_maxID,
    cho_cell_sbo_lib_2$SBO_classification.global
  )
) %>%
  mutate(library = "GEM well 2")

# remove cells with no classified SBO
rna_sbo_lib_2_subset <- subset(cho_cell_sbo_lib_2,
  idents = "Negative", invert = T
)

Summarise SBO classifications and write to file.

combined_sbo_classification <- tibble(
  `Sample ID` = lib1_sbo_classification$`Sample ID`,
  `Sample label` = lib1_sbo_classification$`Sample label`,
  Doublet = lib1_sbo_classification$Doublet + lib2_sbo_classification$Doublet,
  Negative = lib1_sbo_classification$Negative + lib2_sbo_classification$Negative,
  Singlet = lib1_sbo_classification$Singlet + lib2_sbo_classification$Singlet
)

combined_sbo_classification %>%
  DT::datatable()
# write supplementary table
filename <- paste(results_dir, "Table S3.xlsx",
  sep = ""
)

write_xlsx(list(
  a = combined_sbo_classification,
  b = lib1_sbo_classification,
  c = lib2_sbo_classification
),
path = filename,
format_headers = TRUE
)

Combine the two GEM well libraries.

# merge non-negative cells
rna_sbo_combined <- merge(rna_sbo_lib_1_subset,
  y = c(rna_sbo_lib_2_subset),
  add.cell.ids = c("lib1", "lib2"),
  project = "TM_analysis", merge.data = TRUE
)

# summary
rna_sbo_combined
## An object of class Seurat 
## 15903 features across 5911 samples within 2 assays 
## Active assay: RNA (15898 features, 0 variable features)
##  1 other assay present: SBO

Visualise the SBO counts using t-Stochastic neighbour embedding.

sbo_dist_mtx <- as.matrix(dist(t(GetAssayData(object = rna_sbo_combined, assay = "SBO"))))

rna_sbo_combined <- RunTSNE(rna_sbo_combined, distance.matrix = sbo_dist_mtx, 
                            perplexity = 100)

sbo_combined_tsne_plot <- DimPlot(rna_sbo_combined)

figure_2b <- sbo_combined_tsne_plot$data %>%
  mutate(tag = case_when(
    ident == "control-CCGGACTT" ~ "TM- [CCGGACTT]",
    ident == "1-hours-treated-GCTTGCCT" ~ "TM+ 1hr [GCTTGCCT]",
    ident == "2-hours-treated-GAACTCCC" ~ "TM+ 2hr [GAACTCCCT]",
    ident == "4-hours-treated-GTCAGGAT" ~ "TM+ 4hr [GTCAGGAT]",
    ident == "8-hours-treated-GTATATCC" ~ "TM+ 8hr [GTATATCC]",
    ident == "Doublet" ~ "Doublet",
  )) %>%
  ggplot(aes(x = tSNE_1, y = tSNE_2, color = tag)) +
  geom_point(size = 0.1) +
  theme_bw() +
  scale_color_manual(values = c("red", "#fde725", "#5ec962", "#21918c", "#31688e", "#440154")) +
  labs(x = "tSNE 1", y = "tSNE 2", color = "Sample [barcode]") +
  guides(color = guide_legend(override.aes = list(size = 2)))

figure_2b
t-Stochastic neighbour embedding (tSNE) dimensionality reduction and visualisation of the combined GEM well 1 and GEM well 2 UMI-SBO count matrices. HTODemux classifications are shown along with those cells identified as doublets.t-Stochastic neighbour embedding (tSNE) dimensionality reduction and visualisation of the combined GEM well 1 and GEM well 2 UMI-SBO count matrices. HTODemux classifications are shown along with those cells identified as doublets.

t-Stochastic neighbour embedding (tSNE) dimensionality reduction and visualisation of the combined GEM well 1 and GEM well 2 UMI-SBO count matrices. HTODemux classifications are shown along with those cells identified as doublets.t-Stochastic neighbour embedding (tSNE) dimensionality reduction and visualisation of the combined GEM well 1 and GEM well 2 UMI-SBO count matrices. HTODemux classifications are shown along with those cells identified as doublets.

ggsave(plot = figure_2b, filename = paste(results_dir, "Figure_S5.png", sep = ""), width = 6, height = 4, device = "png", dpi = 700)

Show overall sample classifications.

overall_classification_counts <- bind_rows(classification_library_1, classification_library_2) %>%
  dplyr::rename("SBO_classification.global" = Var2) %>%
   mutate(gem_well = case_when(library == "library 1" ~ "GEM well 1", 
                              TRUE ~ "GEM well 2")) %>%
  mutate(ident = case_when(
    Var1 == "control-CCGGACTT" ~ "TM- [CCGGACTT]",
    Var1 == "1-hours-treated-GCTTGCCT" ~ "TM+ 1hr [GCTTGCCT]",
    Var1 == "2-hours-treated-GAACTCCC" ~ "TM+ 2hr [GAACTCCCT]",
    Var1 == "4-hours-treated-GTCAGGAT" ~ "TM+ 4hr [GTCAGGAT]",
    Var1 == "8-hours-treated-GTATATCC" ~ "TM+ 8hr [GTATATCC]",
    Var1 == "Doublet" ~ "Doublet",
  )) %>%
dplyr::group_by(library, SBO_classification.global) %>%
  dplyr::summarise(total_cells = sum(Freq))
## `summarise()` has grouped output by 'library'. You can override using the `.groups` argument.
fig_s6_a <- overall_classification_counts  %>%
  ggplot(aes(x = library, y = total_cells, fill = SBO_classification.global)) +
  geom_bar(stat = "identity", position = "stack") +
  geom_text(aes(x= library, y=3000, fill=SBO_classification.global, label=total_cells), 
            data = overall_classification_counts  %>% filter(SBO_classification.global =="Singlet"), 
            size = 3, position = position_stack(vjust = 0.5),
            color="white") + 
  labs(y = "# Cells", x = "", fill = "") +
  theme_bw() +
  theme(
    legend.position = "right",
    axis.text.x = element_text(angle = 35, vjust = 1, hjust = 1)
  ) +
  scale_fill_viridis(discrete = T, direction = -1, option = "plasma")
## Warning: Ignoring unknown aesthetics: fill
gem_well_classification_counts <-bind_rows(classification_library_1, classification_library_2) %>%
  dplyr::rename("SBO_classification.global" = Var2) %>%
  mutate(ident = case_when(
    Var1 == "control-CCGGACTT" ~ "TM- [CCGGACTT]",
    Var1 == "1-hours-treated-GCTTGCCT" ~ "TM+ 1hr [GCTTGCCT]",
    Var1 == "2-hours-treated-GAACTCCC" ~ "TM+ 2hr [GAACTCCCT]",
    Var1 == "4-hours-treated-GTCAGGAT" ~ "TM+ 4hr [GTCAGGAT]",
    Var1 == "8-hours-treated-GTATATCC" ~ "TM+ 8hr [GTATATCC]",
    Var1 == "Doublet" ~ "Doublet",
  ))
  

fig_s6_b <- gem_well_classification_counts  %>%
  ggplot(aes(x=ident, y=Freq, fill=SBO_classification.global, label=Freq)) +
  geom_bar(stat="identity",position = "stack", width = 0.75) +
  geom_text(aes(x=ident, y=500, fill=SBO_classification.global, label=Freq), 
            data = gem_well_classification_counts %>% filter(SBO_classification.global =="Singlet"), 
            size = 3, position = position_stack(vjust = 0.5),
            color="white") +
  labs(x="", y="# Cells", fill="") +
  facet_wrap(~library, nrow = 2) + 
  theme_bw() +
  theme(axis.text.x = element_text(angle = 35, vjust = 1, hjust = 1), 
        strip.background = element_blank()) +
  scale_fill_viridis(discrete = T, direction = -1, option = "plasma")
## Warning: Ignoring unknown aesthetics: fill
fig_s6_a + fig_s6_b + plot_annotation(tag_levels = "a") + 
  plot_layout(guides = "collect", widths = c(0.4,1)) & 
  theme(legend.position = "bottom")
HTODemux with a positive quantile threshold = 0.94 was used to identify a total of (a) 2,645 and 2,669 singlets from GEM well 1 and GEM well 2 respectively. The (b,c) sample classifications across the 2 GEM wells were comparable. The number of singlet classifications are shown in white; the number of negative and doublet classifications are provided in Table S3. For the TM+ 4hr sample, a comparable number of SBO negative cells was observed for GEM well 1 (n = 141) and GEM well 2 (n = 190). While the origin of this variation is unknown these data indicate that the difference in detection rate of TM+ 4hr arose from the experimental steps conducted prior to cell isolation.

HTODemux with a positive quantile threshold = 0.94 was used to identify a total of (a) 2,645 and 2,669 singlets from GEM well 1 and GEM well 2 respectively. The (b,c) sample classifications across the 2 GEM wells were comparable. The number of singlet classifications are shown in white; the number of negative and doublet classifications are provided in Table S3. For the TM+ 4hr sample, a comparable number of SBO negative cells was observed for GEM well 1 (n = 141) and GEM well 2 (n = 190). While the origin of this variation is unknown these data indicate that the difference in detection rate of TM+ 4hr arose from the experimental steps conducted prior to cell isolation.

ggsave(filename = paste(results_dir, "Figure S6.png", sep = ""), 
       width = 6, height = 5, device = "png", dpi = 700)
gem_lib_singlet_counts <- bind_rows(classification_library_1, classification_library_2) %>%
  mutate(ident = case_when(
    Var1 == "control-CCGGACTT" ~ "TM-",
    Var1 == "1-hours-treated-GCTTGCCT" ~ "TM+ 1hr",
    Var1 == "2-hours-treated-GAACTCCC" ~ "TM+ 2hr",
    Var1 == "4-hours-treated-GTCAGGAT" ~ "TM+ 4hr",
    Var1 == "8-hours-treated-GTATATCC" ~ "TM+ 8hr",
    Var1 == "Doublet" ~ "Doublet",
  )) %>%
  dplyr::group_by(ident, Var2) %>%
  dplyr::summarise(total_cells = sum(Freq))
## `summarise()` has grouped output by 'ident'. You can override using the `.groups` argument.
gem_lib_singlet_counts %>%
  ggplot(aes(x=ident, y=total_cells, fill=Var2)) +
  geom_bar(stat="identity",position = "stack", width = 0.75) +
  geom_text(aes(x=ident, y=1000, label=total_cells), 
            data = gem_lib_singlet_counts %>% 
              filter(Var2 =="Singlet"), 
            size = 3, 
            position = position_stack(vjust = 0.5),
            color="white") +
  labs(x="", y="# Cells", fill="", title="SBO sample classification") +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 35, vjust = 1, hjust = 1), legend.position = "bottom") +
  scale_fill_viridis(discrete = T, direction = -1, option = "plasma")
5,314 cells were confidently assigned to one of the 5 samples.

5,314 cells were confidently assigned to one of the 5 samples.

ggsave(filename = paste(results_dir, "Figure S4.png", sep = ""), width = 4, height = 3, device = "png", dpi = 700)

Select cells with only singlet identifications and merge the data

# select only singlet cells from both libraries
Idents(cho_cell_sbo_lib_1) <- "SBO_classification.global"
cho_1.singlet <- subset(cho_cell_sbo_lib_1, idents = "Singlet")
  
Idents(cho_cell_sbo_lib_2) <- "SBO_classification.global"
cho_2.singlet <- subset(cho_cell_sbo_lib_2, idents = "Singlet")

# merge
classified_cells <- merge(cho_1.singlet,
  y = c(cho_2.singlet),
  add.cell.ids = c("lib1", "lib2"),
  project = "TM_analysis", merge.data = TRUE
)

# total cells
paste0(dim(classified_cells)[2], " cells with  sample classifications")
## [1] "5314 cells with  sample classifications"

Dimensionality reduction and visualisation

classified_cells@meta.data %>%
  group_by(treatment, Phase) %>%
  dplyr::summarise(n = n()) %>%
  dplyr::mutate(freq = n / sum(n)) %>%
  ggplot(aes(x = treatment, y = freq, fill = Phase)) +
  geom_bar(stat = "identity", position = "stack") +
  scale_fill_brewer(palette = "Dark2",direction = 1) +
  geom_text(aes(label=paste0(sprintf("%1.1f", freq*100),"%")),
                     position=position_stack(vjust=0.5), colour="white", size = 3) +
  #scale_fill_manual(values = c("#CC6677", "#44AA99", "#88CCEE")) +
  theme_bw() +
  labs(x="",y = "Proportion of cells", fill = "Phase")
## `summarise()` has grouped output by 'treatment'. You can override using the `.groups` argument.
The Seurat CellCycleScoring function was used to classify the phase of cell cycle for the 5,314 cells based on the expression of genes associated with the S phase and the G2M phase (a cell not classified in either phase is designated as G1). The proportion of cells predicted to be in each phase for each of the 5 samples is shown in white. The likelihood of each cell being either S or G2M is also outputted by the algorithm and was used to regress the effect of cell cycle from the data during SCTransform normalisation.

The Seurat CellCycleScoring function was used to classify the phase of cell cycle for the 5,314 cells based on the expression of genes associated with the S phase and the G2M phase (a cell not classified in either phase is designated as G1). The proportion of cells predicted to be in each phase for each of the 5 samples is shown in white. The likelihood of each cell being either S or G2M is also outputted by the algorithm and was used to regress the effect of cell cycle from the data during SCTransform normalisation.

ggsave(filename = paste(results_dir, "Figure S7.png", sep = ""), width = 5, height = 3, device = "png", dpi = 700)

Run principal components analysis

classified_cells <- RunPCA(object = classified_cells, verbose = FALSE, npcs = 50,seed.use = 42)

figure_s5 <- ElbowPlot(classified_cells,ndims = 50)

figure_s5$data %>%
  ggplot(aes(x=dims, y=stdev)) +
  geom_line() + 
  geom_point() +
  geom_vline(xintercept = 20, linetype=11, size=0.2, color="red") +
  theme_bw() +
  labs(x="Principal components", y="Standard Deviation")
Principal components analysis was performed on the SCTransform normalised gene expression data (cells = 5,314, genes = 3000). The first 20 principal components were retained for the UMAP visualisation.

Principal components analysis was performed on the SCTransform normalised gene expression data (cells = 5,314, genes = 3000). The first 20 principal components were retained for the UMAP visualisation.

ggsave(filename = paste(results_dir, "Figure S5.png", sep = ""), width = 5, height = 5, device = "png", dpi = 700)

Run UMAP

classified_cells <- RunUMAP(object = classified_cells, 
                            dims = 1:20)
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 18:49:30 UMAP embedding parameters a = 0.9922 b = 1.112
## 18:49:30 Read 5314 rows and found 20 numeric columns
## 18:49:30 Using Annoy for neighbor search, n_neighbors = 30
## 18:49:30 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 18:49:31 Writing NN index file to temp file /var/folders/6h/n6dgj8v919lbs03811dwp1m00000gn/T//RtmpexwAHu/file7d7b5a790c55
## 18:49:31 Searching Annoy index using 1 thread, search_k = 3000
## 18:49:32 Annoy recall = 100%
## 18:49:37 Commencing smooth kNN distance calibration using 1 thread
## 18:49:39 Initializing from normalized Laplacian + noise
## 18:49:39 Commencing optimization for 500 epochs, with 219954 positive edges
## 18:49:47 Optimization finished
### Supplementary Figure 6
# overlay library
seurat_dim_plot_lib <- DimPlot(classified_cells, reduction = "umap", group.by = c("library"), pt.size = 0.5)
# overlay cell cycle phase
seurat_dim_plot_cc <- DimPlot(classified_cells, reduction = "umap", group.by = c("Phase"), pt.size = 0.5)

cbp1 <- c(
  "#E69F00", "#009E73",
  "#F0E442", "#0072B2", "#D55E00", "#CC79A7"
)

figure_s6a <- seurat_dim_plot_lib$data %>%
   mutate(gem_well = case_when(library == "library 1" ~ "GEM well 1", 
                              TRUE ~ "GEM well 2")) %>%
  ggplot(aes(x = UMAP_1, y = UMAP_2, color = gem_well)) +
  geom_point(size = 0.2) +
  theme_bw() +
  scale_color_manual(values = cbp1) +
  labs(x = "UMAP 1", y = "UMAP 2", color = "") +
  guides(color = guide_legend(override.aes = list(size = 2)))

figure_s6b <- seurat_dim_plot_cc$data %>%
  ggplot(aes(x = UMAP_1, y = UMAP_2, color = Phase)) +
  geom_point(size = 0.2) +
  theme_bw() +
  scale_colour_brewer(palette = "Dark2",direction = 1) +
  # scale_color_manual(values = cbp1) +
  labs(x = "UMAP 1", y = "UMAP 2", color = "") +
  guides(color = guide_legend(override.aes = list(size = 2)))


(figure_s6a + plot_spacer() + figure_s6b) + plot_layout(widths =  c(4,0.2,4)) + plot_annotation(tag_levels = 'a') & theme(legend.position = "bottom")

ggsave(filename = paste(results_dir, "Figure S6.png", sep = ""), 
       width = 7, height = 4, device = "png", dpi = 700)
sbo_tagged_cells_umap_plot <- DimPlot(classified_cells, group.by = c("treatment"), combine = FALSE)

viridis_cp <- c("#fde725", "#5ec962", "#21918c", "#31688e", "#440154")

tm_umap <- sbo_tagged_cells_umap_plot[[1]]$data %>%
  ggplot(aes(x = UMAP_1, y = UMAP_2, color = treatment)) +
  geom_point(size = 0.5) +
  theme_bw() +
  coord_fixed() +
  #scale_color_brewer(palette = "Set1") +
  scale_color_manual(values = viridis_cp) +
  labs(x = "UMAP 1", y = "UMAP 2", color = "Treatment") +
  guides(color = guide_legend(override.aes = list(size = 2))) +
  theme(
    plot.title = element_text(size = 10, face = "bold.italic"),
    legend.position = "right",
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )
tm_umap

ggsave(plot= tm_umap, filename = paste(results_dir, "Figure 2d.png", sep = ""), width = 5, height = 4, device = "png", dpi = 700)
classified_cells.3d <- RunUMAP(object = classified_cells, 
                            dims = 1:30, n.components = 3)
## 18:49:51 UMAP embedding parameters a = 0.9922 b = 1.112
## 18:49:51 Read 5314 rows and found 30 numeric columns
## 18:49:51 Using Annoy for neighbor search, n_neighbors = 30
## 18:49:51 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 18:49:51 Writing NN index file to temp file /var/folders/6h/n6dgj8v919lbs03811dwp1m00000gn/T//RtmpexwAHu/file7d7b2807d34d
## 18:49:51 Searching Annoy index using 1 thread, search_k = 3000
## 18:49:52 Annoy recall = 100%
## 18:49:53 Commencing smooth kNN distance calibration using 1 thread
## 18:49:56 Initializing from normalized Laplacian + noise
## 18:49:56 Commencing optimization for 500 epochs, with 225898 positive edges
## 18:50:04 Optimization finished
langevitour(as.data.frame(classified_cells.3d[["umap"]]@cell.embeddings), 
            classified_cells$treatment, pointSize = 2)

Plot ER stress marker genes

hspa5_umap <- FeaturePlot(classified_cells, features = c("ENSCGRG00015003428.1"))$data %>%
  mutate(gene = "Hspa5") %>%
  dplyr::rename(expression = "ENSCGRG00015003428.1") %>%
  ggplot(aes(x = UMAP_1, y = UMAP_2, color = expression)) +
  geom_point(size = 0.01) +
  scale_color_viridis(option = "turbo", direction = 1) +
  theme_bw() +
  coord_fixed() +
  # scale_x_reverse() +
  labs(x = "UMAP 2", y = "UMAP 1", color = "Hspa5 \nExpression \nlevel") +
  theme(
    plot.title = element_text(size = 10, face = "bold.italic"),
    legend.position = "right",
    legend.title = element_text(size = 8),
    legend.text = element_text(size = 7)
  )

ddit3_umap <- FeaturePlot(classified_cells, features = c("ENSCGRG00015028401.1"))$data %>%
  mutate(gene = "Ddit3") %>%
  dplyr::rename(expression = "ENSCGRG00015028401.1") %>%
  ggplot(aes(x = UMAP_1, y = UMAP_2, color = expression)) +
  geom_point(size = 0.1) +
  scale_color_viridis(option = "turbo") +
  theme_bw() +
  coord_fixed() +
  # scale_x_reverse() +
  labs(x = "UMAP 2", y = "UMAP 1", color = "Ddit3 \nExpression \nlevel") +
  theme(
    plot.title = element_text(size = 10, face = "bold.italic"),
    legend.position = "right",
    legend.title = element_text(size = 8),
    legend.text = element_text(size = 7)
  )

atf4_umap <- FeaturePlot(classified_cells, features = c("ENSCGRG00015004082.1"))$data %>%
  mutate(gene = "Atf4") %>%
  dplyr::rename(expression = "ENSCGRG00015004082.1") %>%
  ggplot(aes(x = UMAP_1, y = UMAP_2, color = expression)) +
  geom_point(size = 0.1) +
  scale_color_viridis(option = "turbo") +
  theme_bw() +
  coord_fixed() +
  # scale_x_reverse() +
  labs(x = "UMAP 2", y = "UMAP 1", color = "Atf4 \nExpression \nlevel") +
  theme(
    plot.title = element_text(size = 10, face = "bold.italic"),
    legend.position = "right",
    legend.title = element_text(size = 8),
    legend.text = element_text(size = 7)
  )
standard_normalisation <- NormalizeData(classified_cells,
  normalization.method = "LogNormalize",
  scale.factor = 10000, assay = "RNA"
)

hspa5_violin <- VlnPlot(standard_normalisation,
  features = c("ENSCGRG00015003428.1"),
  group.by = "treatment",
  assay = "RNA"
)$data %>%
  mutate(gene = "Hspa5") %>%
  dplyr::rename(expression = "ENSCGRG00015003428.1") %>%
  ggplot(aes(x = ident, y = `expression`, fill = ident)) +
  geom_violin() +
  geom_jitter(size = 0.1, alpha = 0.1, color = "black") +
  scale_fill_manual(values = c("#fde725", "#5ec962", 
                               "#21918c", "#31688e", "#440154")) +
  theme_bw() +
  labs(x = "", y = "Expression Level", subtitle = "Hspa5") +
  theme(
    plot.subtitle = element_text(face = "italic"),
    legend.position = "none",
    strip.text.x = element_text(face = "bold.italic"),
    strip.background = element_blank(),
    panel.spacing = unit(2, "lines"),
    axis.text.x = element_text(angle = 35, vjust = 1, hjust = 1)
  ) +
  scale_y_continuous(breaks = pretty_breaks(n = 4, min.n = 0))

ddit3_violin <- VlnPlot(standard_normalisation, 
                        features = c("ENSCGRG00015028401.1"), 
                        group.by = "treatment", assay = "RNA")$data %>%
  mutate(gene = "Ddit3") %>%
  dplyr::rename(expression = "ENSCGRG00015028401.1") %>%
  ggplot(aes(x = ident, y = `expression`, fill = ident)) +
  geom_violin() +
  geom_jitter(size = 0.1, alpha = 0.1) +
  scale_fill_manual(values = c("#fde725", "#5ec962", 
                               "#21918c", "#31688e", "#440154")) +
  theme_bw() +
  labs(x = "", y = "Expression Level", subtitle = "Ddit3") +
  theme(
    plot.subtitle = element_text(face = "italic"),
    legend.position = "none",
    strip.text.x = element_text(face = "bold.italic"),
    strip.background = element_blank(),
    axis.text.x = element_text(angle = 35, vjust = 1, hjust = 1)
  ) +
  scale_y_continuous(breaks = pretty_breaks(n = 4, min.n = 0))

atf4_violin <- VlnPlot(standard_normalisation, 
                       features = c("ENSCGRG00015004082.1"), 
                       group.by = "treatment", assay = "RNA")$data %>%
  mutate(gene = "Atf4") %>%
  dplyr::rename(expression = "ENSCGRG00015004082.1") %>%
  ggplot(aes(x = ident, y = `expression`, fill = ident)) +
  geom_violin() +
  geom_jitter(size = 0.1, alpha = 0.1) +
  scale_fill_manual(values = c("#fde725", "#5ec962", 
                               "#21918c", "#31688e", "#440154")) +
  theme_bw() +
  labs(x = "", y = "Expression Level", subtitle = "Atf4") +
  theme(
    plot.subtitle = element_text(face = "italic"),
    legend.position = "none",
    strip.text.x = element_text(face = "bold.italic"),
    strip.background = element_blank(),
    panel.spacing = unit(2, "lines"),
    axis.text.x = element_text(angle = 35, vjust = 1, hjust = 1)
  ) +
  scale_y_continuous(breaks = pretty_breaks(n = 4, min.n = 0))
marker_gene_umap <- (((hspa5_violin / hspa5_umap + plot_layout(heights = c(0.8, 1)))) |
  (atf4_violin / atf4_umap + plot_layout(heights = c(0.8, 1))) |
  (ddit3_violin / ddit3_umap + plot_layout(heights = c(0.7, 1))))

marker_gene_umap

ggsave( filename = paste(results_dir, "Figure 2EFG.png", sep = ""), width = 9, height = 4.5, device = "png", dpi = 700)

WGCNA

# WGCNA is performed for variable genes identified by SCTransform
genes.keep <- VariableFeatures(classified_cells, assay = "SCT")

# create metacells for each treatment
seurat_list <- list()
for (group in unique(classified_cells$treatment)) {
  print(group)
  cur_seurat <- subset(classified_cells, treatment == group)
  cur_seurat <- cur_seurat[genes.keep, ]
  cur_metacell_seurat <- scWGCNA::construct_metacells(
    cur_seurat,
    name = group,
    k = 60,
    reduction = "umap",
    assay = "SCT", slot = "data"
  )
  seurat_list[[group]] <- cur_metacell_seurat
}
## [1] "TM+ 1hr"
## Overlap QC metrics:
## Cells per bin: 60
## Maximum shared cells bin-bin: 53
## Mean shared cells bin-bin: 3.42637544876351
## Median shared cells bin-bin: 0
## [1] "TM+ 4hr"
## Overlap QC metrics:
## Cells per bin: 60
## Maximum shared cells bin-bin: 53
## Mean shared cells bin-bin: 4.82433048433048
## Median shared cells bin-bin: 0
## [1] "TM+ 2hr"
## Overlap QC metrics:
## Cells per bin: 60
## Maximum shared cells bin-bin: 53
## Mean shared cells bin-bin: 2.84098432327775
## Median shared cells bin-bin: 0
## [1] "TM-"
## Overlap QC metrics:
## Cells per bin: 60
## Maximum shared cells bin-bin: 53
## Mean shared cells bin-bin: 3.5514853472501
## Median shared cells bin-bin: 0
## [1] "TM+ 8hr"
## Overlap QC metrics:
## Cells per bin: 60
## Maximum shared cells bin-bin: 53
## Mean shared cells bin-bin: 4.48521597174793
## Median shared cells bin-bin: 0
metacell_seurat <- merge(seurat_list[[1]], seurat_list[2:length(seurat_list)])

# plot showing the number of metacells for each treatment
metacell_counts <- metacell_seurat@meta.data %>%
  ggplot(aes(orig.ident, fill=factor(orig.ident))) +
  geom_bar(width = 0.75) +
  geom_text(aes(label=..count..),
            stat='count',
            size = 3, 
            position = position_stack(vjust = 1.07),
            color="black") +
  labs(x="", y="# Cells", fill="", title="") +
  theme_bw() +
  theme( legend.position = "none") +
  scale_fill_manual(values = viridis_cp)  

# umap for metacell data
#scale data
metacell_plot_data <- ScaleData(metacell_seurat, features = rownames(metacell_seurat))
## Centering and scaling data matrix
# PCA
metacell_plot_data  <- RunPCA(metacell_plot_data , features=rownames(metacell_seurat))
## PC_ 1 
## Positive:  ENSCGRG00015008468.1, ENSCGRG00015013447.1, ENSCGRG00015015091.1, ENSCGRG00015003428.1, ENSCGRG00015022604.1, ENSCGRG00015027322.1, ENSCGRG00015025683.1, ENSCGRG00015015991.1, ENSCGRG00015026521.1, ENSCGRG00015026028.1 
##     ENSCGRG00015010095.1, ENSCGRG00015009471.1, ENSCGRG00015022586.1, ENSCGRG00015019761.1, ENSCGRG00015014388.1, ENSCGRG00015006319.1, ENSCGRG00015020677.1, ENSCGRG00015016950.1, ENSCGRG00015019211.1, ENSCGRG00015022396.1 
##     ENSCGRG00015001471.1, ENSCGRG00015004273.1, ENSCGRG00015009189.1, ENSCGRG00015019578.1, ENSCGRG00015006545.1, ENSCGRG00015017417.1, ENSCGRG00015010327.1, ENSCGRG00015001538.1, ENSCGRG00015008983.1, ENSCGRG00015011727.1 
## Negative:  ENSCGRG00015017491.1, ENSCGRG00015025326.1, ENSCGRG00015004901.1, ENSCGRG00015002930.1, ENSCGRG00015004168.1, ENSCGRG00015003450.1, ENSCGRG00015014919.1, ENSCGRG00015027261.1, ENSCGRG00015011772.1, ENSCGRG00015012068.1 
##     ENSCGRG00015004938.1, ENSCGRG00015008420.1, ENSCGRG00015007225.1, ENSCGRG00015002356.1, ENSCGRG00015027413.1, ENSCGRG00015025730.1, ENSCGRG00015003611.1, ENSCGRG00015005685.1, ENSCGRG00015004826.1, ENSCGRG00015018744.1 
##     ENSCGRG00015022269.1, ENSCGRG00015020800.1, ENSCGRG00015016402.1, ENSCGRG00015009774.1, ENSCGRG00015028239.1, ENSCGRG00015025229.1, ENSCGRG00015024910.1, ENSCGRG00015012890.1, ENSCGRG00015008892.1, ENSCGRG00015019547.1 
## PC_ 2 
## Positive:  ENSCGRG00015004779.1, ENSCGRG00015016404.1, ENSCGRG00015016232.1, ENSCGRG00015008293.1, ENSCGRG00015010333.1, ENSCGRG00015013488.1, ENSCGRG00015002392.1, ENSCGRG00015027158.1, ENSCGRG00015024583.1, ENSCGRG00015002025.1 
##     ENSCGRG00015015874.1, ENSCGRG00015009532.1, ENSCGRG00015026887.1, ENSCGRG00015021269.1, ENSCGRG00015026619.1, ENSCGRG00015023319.1, ENSCGRG00015002199.1, ENSCGRG00015005679.1, ENSCGRG00015009688.1, ENSCGRG00015018981.1 
##     ENSCGRG00015013561.1, ENSCGRG00015007461.1, ENSCGRG00015011260.1, ENSCGRG00015011238.1, ENSCGRG00015011569.1, ENSCGRG00000000027.1, ENSCGRG00015009600.1, ENSCGRG00015009550.1, ENSCGRG00015019058.1, ENSCGRG00015017280.1 
## Negative:  ENSCGRG00015026684.1, ENSCGRG00015028559.1, ENSCGRG00015022215.1, ENSCGRG00015012070.1, ENSCGRG00015020336.1, ENSCGRG00015025312.1, ENSCGRG00015014090.1, ENSCGRG00015007431.1, ENSCGRG00015015800.1, ENSCGRG00015012028.1 
##     ENSCGRG00015010527.1, ENSCGRG00015006773.1, ENSCGRG00015018389.1, ENSCGRG00015008608.1, ENSCGRG00015019463.1, ENSCGRG00015003139.1, ENSCGRG00015018703.1, ENSCGRG00015028557.1, ENSCGRG00015015218.1, ENSCGRG00015000948.1 
##     ENSCGRG00015020294.1, ENSCGRG00015015195.1, ENSCGRG00015001960.1, ENSCGRG00015027289.1, ENSCGRG00015003621.1, ENSCGRG00015024868.1, ENSCGRG00015018475.1, ENSCGRG00015028364.1, ENSCGRG00015003776.1, ENSCGRG00015021944.1 
## PC_ 3 
## Positive:  ENSCGRG00015010052.1, ENSCGRG00015012800.1, ENSCGRG00015009992.1, ENSCGRG00015002922.1, ENSCGRG00015000344.1, ENSCGRG00015027997.1, ENSCGRG00015020418.1, ENSCGRG00015027476.1, ENSCGRG00015005824.1, ENSCGRG00015009154.1 
##     ENSCGRG00015013137.1, ENSCGRG00015022098.1, ENSCGRG00015018084.1, ENSCGRG00015009453.1, ENSCGRG00015010326.1, ENSCGRG00015016154.1, ENSCGRG00015005349.1, ENSCGRG00015021666.1, ENSCGRG00015019428.1, ENSCGRG00015028752.1 
##     ENSCGRG00015012185.1, ENSCGRG00015010009.1, ENSCGRG00015024743.1, ENSCGRG00015002920.1, ENSCGRG00015013663.1, ENSCGRG00015019762.1, ENSCGRG00015010581.1, ENSCGRG00015016760.1, ENSCGRG00015007815.1, ENSCGRG00015017984.1 
## Negative:  ENSCGRG00015024900.1, ENSCGRG00015019831.1, ENSCGRG00015028341.1, ENSCGRG00015001288.1, ENSCGRG00015018903.1, ENSCGRG00015019284.1, ENSCGRG00015025093.1, ENSCGRG00015028494.1, ENSCGRG00015000383.1, ENSCGRG00015018199.1 
##     ENSCGRG00015003721.1, ENSCGRG00015026559.1, ENSCGRG00015012642.1, ENSCGRG00015010498.1, ENSCGRG00015021979.1, ENSCGRG00015001553.1, ENSCGRG00015028233.1, ENSCGRG00015000477.1, ENSCGRG00015006794.1, ENSCGRG00015021481.1 
##     ENSCGRG00015004004.1, ENSCGRG00015011405.1, ENSCGRG00015025505.1, ENSCGRG00015014701.1, ENSCGRG00015011632.1, ENSCGRG00015028143.1, ENSCGRG00015021059.1, ENSCGRG00015022145.1, ENSCGRG00015001852.1, ENSCGRG00015020215.1 
## PC_ 4 
## Positive:  ENSCGRG00015000816.1, ENSCGRG00015021620.1, ENSCGRG00015017628.1, ENSCGRG00015003433.1, ENSCGRG00015021436.1, ENSCGRG00015000167.1, ENSCGRG00015000182.1, ENSCGRG00015010035.1, ENSCGRG00015025447.1, ENSCGRG00015026127.1 
##     ENSCGRG00015015531.1, ENSCGRG00015009348.1, ENSCGRG00000000025.1, ENSCGRG00015000220.1, ENSCGRG00015011563.1, ENSCGRG00015003448.1, ENSCGRG00015000209.1, ENSCGRG00015021874.1, ENSCGRG00015012901.1, ENSCGRG00015004060.1 
##     ENSCGRG00015006156.1, ENSCGRG00015006294.1, ENSCGRG00015002485.1, ENSCGRG00015014964.1, ENSCGRG00015000440.1, ENSCGRG00015025273.1, ENSCGRG00015027573.1, ENSCGRG00015000997.1, ENSCGRG00015000259.1, ENSCGRG00015002462.1 
## Negative:  ENSCGRG00015002131.1, ENSCGRG00015009345.1, ENSCGRG00015003514.1, ENSCGRG00015006070.1, ENSCGRG00015003424.1, ENSCGRG00015004734.1, ENSCGRG00015000795.1, ENSCGRG00015000949.1, ENSCGRG00015002393.1, ENSCGRG00015021473.1 
##     ENSCGRG00015007609.1, ENSCGRG00015000621.1, ENSCGRG00015007573.1, ENSCGRG00015002873.1, ENSCGRG00015024489.1, ENSCGRG00015024355.1, ENSCGRG00015000135.1, ENSCGRG00015011296.1, ENSCGRG00015016303.1, ENSCGRG00015024309.1 
##     ENSCGRG00015003824.1, ENSCGRG00015021370.1, ENSCGRG00015024410.1, ENSCGRG00015010385.1, ENSCGRG00015026554.1, ENSCGRG00015017121.1, ENSCGRG00015015835.1, ENSCGRG00015002196.1, ENSCGRG00015003913.1, ENSCGRG00015005005.1 
## PC_ 5 
## Positive:  ENSCGRG00015023249.1, ENSCGRG00015007326.1, ENSCGRG00015028213.1, ENSCGRG00015022328.1, ENSCGRG00015028424.1, ENSCGRG00015028327.1, ENSCGRG00015000582.1, ENSCGRG00015005885.1, ENSCGRG00015025701.1, ENSCGRG00015028328.1 
##     ENSCGRG00015027296.1, ENSCGRG00015019414.1, ENSCGRG00015023751.1, ENSCGRG00015020583.1, ENSCGRG00015022866.1, ENSCGRG00015000561.1, ENSCGRG00015012079.1, ENSCGRG00015011620.1, ENSCGRG00015006198.1, ENSCGRG00015002667.1 
##     ENSCGRG00015021619.1, ENSCGRG00015007350.1, ENSCGRG00015010116.1, ENSCGRG00015009254.1, ENSCGRG00015003194.1, ENSCGRG00015024665.1, ENSCGRG00015024540.1, ENSCGRG00015022515.1, ENSCGRG00015013169.1, ENSCGRG00015024365.1 
## Negative:  ENSCGRG00015013695.1, ENSCGRG00015008374.1, ENSCGRG00015002539.1, ENSCGRG00015007893.1, ENSCGRG00015007308.1, ENSCGRG00015028365.1, ENSCGRG00015007383.1, ENSCGRG00015024505.1, ENSCGRG00015017614.1, ENSCGRG00015005251.1 
##     ENSCGRG00015024337.1, ENSCGRG00015027773.1, ENSCGRG00015018368.1, ENSCGRG00015014989.1, ENSCGRG00015009804.1, ENSCGRG00015007258.1, ENSCGRG00015024349.1, ENSCGRG00015010053.1, ENSCGRG00015003888.1, ENSCGRG00015026221.1 
##     ENSCGRG00015023475.1, ENSCGRG00015011121.1, ENSCGRG00015018459.1, ENSCGRG00015007954.1, ENSCGRG00015002538.1, ENSCGRG00015004822.1, ENSCGRG00015007374.1, ENSCGRG00015007760.1, ENSCGRG00015003665.1, ENSCGRG00015020782.1
# UMAP
metacell_plot_data  <- RunUMAP(metacell_plot_data , reduction='pca', dims=1:15)
## 18:51:02 UMAP embedding parameters a = 0.9922 b = 1.112
## 18:51:02 Read 2159 rows and found 15 numeric columns
## 18:51:02 Using Annoy for neighbor search, n_neighbors = 30
## 18:51:02 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 18:51:03 Writing NN index file to temp file /var/folders/6h/n6dgj8v919lbs03811dwp1m00000gn/T//RtmpexwAHu/file7d7b67160cbd
## 18:51:03 Searching Annoy index using 1 thread, search_k = 3000
## 18:51:04 Annoy recall = 100%
## 18:51:05 Commencing smooth kNN distance calibration using 1 thread
## 18:51:06 Initializing from normalized Laplacian + noise
## 18:51:06 Commencing optimization for 500 epochs, with 66002 positive edges
## 18:51:10 Optimization finished
# plot
metacell_umap <- DimPlot(metacell_plot_data ,  reduction='umap', label=TRUE)

metacell_umap <- metacell_umap[[1]]$data %>%
  ggplot(aes(x = UMAP_1, y = UMAP_2, color = ident)) +
  geom_point(size = 0.1) +
  theme_bw() +
  coord_fixed() +
  scale_color_manual(values = viridis_cp) +
  labs(x = "UMAP 1", y = "UMAP 2", color = "", subtitle = "") +
  guides(color = guide_legend(override.aes = list(size = 2))) +
  theme(
    plot.title = element_text(size = 10, face = "bold.italic"),
    legend.position = "right",
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )

metacell_counts + metacell_umap + plot_annotation(tag_levels = "a") + plot_layout(widths = c(2,1), heights = c(1,1)) 

ggsave(filename = paste(results_dir, "Figure S10.png", sep = ""), width = 9, height = 5, device = "png", dpi = 700)
#nclusters <- length(unique(metacell_seurat$orig.ident))

genes.use <- rownames(metacell_seurat)

targets <- metacell_seurat@meta.data

group <- as.factor(metacell_seurat$orig.ident)

#create expression matrix
datExpr <- as.data.frame(GetAssayData(metacell_seurat, assay = "RNA", slot = "data")[genes.use, ])

# transpose
datExpr <- as.data.frame(t(datExpr))

datExpr <- datExpr[, goodGenes(datExpr)]

# Choose a set of soft-thresholding powers
powers <- c(seq(1, 10, by = 1), seq(12, 50, by = 2))

# assess metrics over selected soft threshold range
powerTable <- list(
  data = pickSoftThreshold(
    datExpr,
    powerVector = powers,
    verbose = 100,
    networkType = "signed",
    corFnc = "cor"
  )[[2]]
)
## pickSoftThreshold: will use block size 3000.
##  pickSoftThreshold: calculating connectivity for given powers...
##    ..working on genes 1 through 3000 of 3000
## Warning: executing %dopar% sequentially: no parallel backend registered
##    Power SFT.R.sq  slope truncated.R.sq  mean.k. median.k. max.k.
## 1      1   0.4450 11.700          0.875 1530.000  1.53e+03 1670.0
## 2      2   0.0135  0.992          0.795  824.000  8.06e+02 1020.0
## 3      3   0.0181 -0.683          0.440  465.000  4.41e+02  662.0
## 4      4   0.6000 -1.960          0.685  274.000  2.51e+02  461.0
## 5      5   0.8420 -1.900          0.865  169.000  1.50e+02  345.0
## 6      6   0.9100 -1.660          0.932  109.000  9.20e+01  270.0
## 7      7   0.9190 -1.530          0.948   73.900  5.80e+01  221.0
## 8      8   0.9240 -1.470          0.968   52.000  3.74e+01  188.0
## 9      9   0.9260 -1.410          0.979   38.000  2.46e+01  163.0
## 10    10   0.9250 -1.370          0.986   28.700  1.67e+01  143.0
## 11    12   0.9330 -1.340          0.996   17.700  7.88e+00  115.0
## 12    14   0.9330 -1.350          0.994   11.800  4.03e+00   95.6
## 13    16   0.9320 -1.360          0.996    8.310  2.17e+00   81.0
## 14    18   0.9420 -1.370          0.994    6.090  1.18e+00   69.6
## 15    20   0.9540 -1.380          0.995    4.590  6.64e-01   60.5
## 16    22   0.9540 -1.390          0.992    3.550  3.78e-01   53.1
## 17    24   0.9600 -1.390          0.989    2.790  2.21e-01   46.9
## 18    26   0.9620 -1.400          0.986    2.230  1.30e-01   41.7
## 19    28   0.9520 -1.420          0.971    1.800  8.13e-02   37.3
## 20    30   0.9650 -1.420          0.982    1.480  5.06e-02   33.5
## 21    32   0.9650 -1.420          0.984    1.220  3.15e-02   30.2
## 22    34   0.9620 -1.420          0.977    1.020  2.02e-02   27.3
## 23    36   0.9650 -1.410          0.977    0.861  1.30e-02   24.8
## 24    38   0.9690 -1.410          0.975    0.730  8.25e-03   22.6
## 25    40   0.9660 -1.410          0.970    0.624  5.38e-03   20.7
## 26    42   0.9730 -1.410          0.977    0.536  3.49e-03   18.9
## 27    44   0.9710 -1.400          0.974    0.464  2.25e-03   17.4
## 28    46   0.9710 -1.390          0.972    0.403  1.48e-03   16.0
## 29    48   0.9600 -1.390          0.960    0.352  9.79e-04   14.7
## 30    50   0.9560 -1.390          0.955    0.309  6.48e-04   13.6
# plot 
figure_s11a <- powerTable$data %>%
  ggplot(aes(x=Power, y=SFT.R.sq)) +
  geom_point(size=0.1) +
  labs(y="SFT fit", x="Power") +
  geom_vline(xintercept = 7, linetype=11, size=0.2, color="red") +
  theme_bw()
  
figure_s11b <- powerTable$data %>%
  ggplot(aes(x=Power, y=mean.k.)) +
  geom_point(size=0.1) +
  labs(y="Mean connectivity", x="Power") +
  geom_vline(xintercept = 7, linetype=11, size=0.2, color="red") +
  theme_bw() 


figure_s11a + figure_s11b +plot_annotation(tag_levels = "a")

ggsave(filename = paste(results_dir, "Figure S11.png", sep = ""), width = 6, height = 2, device = "png", dpi = 700)
# selected soft threshold
softPower <- 7

nSets <- 1
setLabels <- "treatment"
shortLabels <- setLabels

multiExpr <- list()
multiExpr[["TM"]] <- list(data = datExpr)

checkSets(multiExpr) # check data size
## $nSets
## [1] 1
## 
## $nGenes
## [1] 3000
## 
## $nSamples
## [1] 2159
## 
## $structureOK
## [1] TRUE
# construct network
net <- blockwiseConsensusModules(multiExpr,
  blocks = NULL,
  maxBlockSize = 30000, ## This should be set to a smaller size if the user has limited RAM
  randomSeed = 12345,
  corType = "pearson",
  power = softPower,
  consensusQuantile = 0.3,
  networkType = "signed",
  TOMType = "unsigned",
  TOMDenom = "min",
  scaleTOMs = TRUE, scaleQuantile = 0.8,
  sampleForScaling = TRUE, sampleForScalingFactor = 1000,
  useDiskCache = TRUE, chunkSize = NULL,
  deepSplit = 4,
  pamStage = FALSE,
  detectCutHeight = 0.995,
  minModuleSize = 50,
  mergeCutHeight = 0.1,
  saveConsensusTOMs = TRUE,
  consensusTOMFilePattern = "ConsensusTOM-block.%b.rda"
)
##  Calculating consensus modules and module eigengenes block-wise from all genes
##  Calculating topological overlaps block-wise from all genes
##    Flagging genes and samples with too many missing values...
##     ..step 1
##     TOM calculation: adjacency..
##     ..will not use multithreading.
##      Fraction of slow calculations: 0.000000
##     ..connectivity..
##     ..matrix multiplication (system BLAS)..
##     ..normalization..
##     ..done.
##  ..Working on block 1 .
##  ..Working on block 1 .
##  ..merging consensus modules that are too close..
consMEs <- net$multiMEs
moduleLabels <- net$colors
table(moduleLabels)
## moduleLabels
##     black      blue     brown     green      grey   magenta      pink       red 
##        79       386       259        98      1306        60        70        85 
## turquoise    yellow 
##       483       174
# change the base colors to something better
moduleLabels <- moduleLabels %>%
  revalue(c(
    blue = "#1F78B4",
    brown = "#33A02C",
    magenta = "#E31A1C",
    yellow = "#FFFF99",
    green = "#6A3D9A",
    grey = "#D3D3D3",
    pink = "#FF7F00",
    red = "#B15928",
    turquoise = "#3acabb"
  ))

# Convert the numeric labels to color labels
moduleColors <- as.character(moduleLabels)
length(unique(moduleColors))
## [1] 10
consTree <- net$dendrograms[[1]]

# generatate module eigengenes
MEs <- moduleEigengenes(multiExpr[[1]]$data, colors = moduleColors, nPC = 1)$eigengenes
MEs <- orderMEs(MEs)
meInfo <- data.frame(rownames(datExpr), MEs)
colnames(meInfo)[1] <- "SampleID"

# calculate intramodular connectivity
KMEs <- signedKME(datExpr, MEs, outputColumnName = "kME", corFnc = "bicor")
## Warning in bicor(datExpr, datME, , use = "p"): bicor: zero MAD in variable 'x'.
## Pearson correlation was used for individual columns with zero (or missing) MAD.
# compile into a module metadata table
geneInfo <- as.data.frame(cbind(colnames(datExpr), moduleColors, KMEs))

# how many modules did we get?
nmodules <- length(unique(moduleColors))

# merged gene symbol column
colnames(geneInfo)[1] <- "GeneSymbol"
colnames(geneInfo)[2] <- "Initially.Assigned.Module.Color"
PCvalues <- MEs


# save dendrogram figure
png(file = paste(results_dir, "Figure_3A.png", sep = ""), units = "in", width = 6, height = 3, res = 700)

plotDendroAndColors(consTree, moduleColors, "Module colors",
  dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05,
  main = paste0("")
)

dev.off()
## quartz_off_screen 
##                 2
# show dendrogram figure
plotDendroAndColors(consTree, moduleColors, "Module colors",
  dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05,
  main = paste0("")
)

# figure to show number of genes in each module

# switch the colors back for plot
num_gene_mod_counts <- moduleColors %>%
  revalue(c(
    "#1F78B4" = "blue",
    "#33A02C" = "green",
    "#E31A1C" = "red",
    "#FFFF99" = "yellow",
    "#6A3D9A" = "purple",
    "#D3D3D3" = "grey",
    "#FF7F00" = "orange",
    "#B15928" = "brown",
    "#3acabb" = "turquoise"
  ))

# plot
data.frame(table(num_gene_mod_counts)) %>%
  filter(num_gene_mod_counts != "grey") %>%
  ggplot(aes(x = num_gene_mod_counts, y = Freq, label = Freq, fill = num_gene_mod_counts)) +
  geom_bar(stat = "identity", colour = "grey") +
  geom_label(
    fill = "white",
    size = 3,
    # position = position_stack(vjust = 1.1),
    color = "black"
  ) +
  theme_bw() +
  labs(x = "", y = "# Coexpressed genes") +
  theme(legend.position = "none") +
  scale_fill_manual(values = c(
    "black", "#1F78B4", "#B15928", "#33A02C", "#FF7F00", "#6A3D9A", "#E31A1C",
    "#3acabb", "#FFFF99"
  ))

ggsave(
  filename = paste(results_dir, "Figure 3B.png", sep = ""),
  width = 5,
  height = 3,
  device = "png",
  dpi = 700
)

Now we annotate each of the 3,000 genes used for WGCNA analysis.

ensembl <- useEnsembl(
  biomart = "ensembl",
  dataset = "cgpicr_gene_ensembl",
  version = "102"
)

biomart <- getBM(
  attributes = c(
    "ensembl_gene_id_version", "external_gene_name",
    "description", "gene_biotype", "entrezgene_accession"
  ),
  filters = "ensembl_gene_id_version",
  values = geneInfo$GeneSymbol,
  mart = ensembl,
  uniqueRows = T
)

# new data frame
geneInfo_annotated <- geneInfo

# switch colours back to names
geneInfo_annotated$Initially.Assigned.Module.Color <- revalue(
  geneInfo_annotated$Initially.Assigned.Module.Color,
  c(
    "#1F78B4" = "blue",
    "#33A02C" = "green",
    "#E31A1C" = "red",
    "#FFFF99" = "yellow",
    "#6A3D9A" = "purple",
    "#D3D3D3" = "grey",
    "#FF7F00" = "orange",
    "#B15928" = "brown",
    "#3acabb" = "turquoise"
  )
)

new_colnames_colour_names <- revalue(
  colnames(geneInfo_annotated),
  c(
    "kME#1F78B4" = "kMEblue",
    "kME#33A02C" = "kMEgreen",
    "kME#E31A1C" = "kMEred",
    "kME#FFFF99" = "kMEyellow",
    "kME#6A3D9A" = "kMEpurple",
    "kME#D3D3D3" = "kMEgrey",
    "kME#FF7F00" = "kMEorange",
    "kME#B15928" = "kMEbrown",
    "kME#3acabb" = "kMEturquoise"
  )
)

# make the annotated gene information table
colnames(geneInfo_annotated) <- new_colnames_colour_names

geneInfo_annotated <- geneInfo_annotated %>%
  dplyr::rename(
    ensembl_gene_id_version = GeneSymbol,
    module_color = Initially.Assigned.Module.Color
  ) %>%
  left_join(biomart, by = "ensembl_gene_id_version") %>% # add annotations
  mutate_all(na_if, "") %>%
  distinct(ensembl_gene_id_version, .keep_all = T) %>%
  mutate(gene_symbol = coalesce(external_gene_name, 
                                entrezgene_accession)) %>%
  dplyr::select(c(-entrezgene_accession, 
                  -external_gene_name)) %>%
  dplyr::select(c(ensembl_gene_id_version, 
                  gene_biotype, 
                  gene_symbol, 
                  description, 
                  everything())) %>%
  dplyr::arrange(module_color)





filename <- paste(results_dir, "Table S4.xlsx",
  sep = ""
)

write_xlsx(list(
  a = geneInfo_annotated
),
path = filename,
format_headers = TRUE
)

# show the WGCNA outs
datatable(geneInfo_annotated, rownames = FALSE, filter="top", options = list(pageLength = 5, scrollX=T))

Plot the module eigengenes for each module against the sample IDs to assess correlations in expression.

plot_df <- cbind(dplyr::select(targets, c(orig.ident)), PCvalues)


new_colnames_plot_df <- revalue(colnames(plot_df), 
                           c("ME#1F78B4" = "MEblue",
                             "ME#33A02C" = "MEgreen",
                             "ME#E31A1C" = "MEred",
                             "ME#FFFF99" = "MEyellow",
                             "ME#6A3D9A" = "MEpurple",
                             "ME#D3D3D3" = "MEgrey",
                             "ME#FF7F00" = "MEorange",
                             "ME#B15928" = "MEbrown",
                             "ME#3acabb" = "MEturquoise"))

colnames(plot_df) <- new_colnames_plot_df

plot_df <- reshape2::melt(plot_df, id.vars = c("orig.ident"))

plot_df$orig.ident <- factor(plot_df$orig.ident, 
                             levels = c("TM-", "TM+ 1hr", "TM+ 2hr", "TM+ 4hr", "TM+ 8hr"))

colors <- sub("ME", "", as.character(levels(plot_df$variable)))

wgcna_boxplot_full <- plot_df %>%
  filter(variable != "MEgrey") %>%
  ggplot(aes(x = orig.ident, y = value, fill = orig.ident)) +
  scale_fill_manual(values = c("#fde725", "#5ec962", "#21918c", "#31688e", "#440154")) +
  geom_boxplot(alpha=0.5) +
  RotatedAxis() +
  ylab("Module Eigengene") +
  xlab("") +
  theme_bw() +
  theme(
    legend.position = "none",
    strip.text.x = element_text(face = "bold"),
    strip.background = element_blank(),
    axis.text.x = element_text(angle = 35, vjust = 1, hjust = 1)
  ) +
  facet_wrap(~variable, scales = "free", ncol = 3)

wgcna_boxplot_full

ggsave(plot = wgcna_boxplot_full, filename = paste(results_dir, "Figure S9.png", sep = ""), width = 9, height = 9, device = "png", dpi = 700)

The ME expression of the blue, green and red coexpression modules seem to be related to TM treatment over the time course. Let’s see if any biological processes are enriched.

enrichdir <- paste0(results_dir, "/enrichment_analysis/")
suppressMessages(if (file.exists(enrichdir)) {
  unlink(enrichdir, recursive = T)
})
if (!dir.exists(enrichdir)) {
  dir.create(enrichdir)
}


listGeneSet(organism = "mmusculus")
##                                               name
## 1                  geneontology_Biological_Process
## 2      geneontology_Biological_Process_noRedundant
## 3                  geneontology_Cellular_Component
## 4      geneontology_Cellular_Component_noRedundant
## 5                  geneontology_Molecular_Function
## 6      geneontology_Molecular_Function_noRedundant
## 7                                     pathway_KEGG
## 8                                  pathway_Panther
## 9                                 pathway_Reactome
## 10                             pathway_Wikipathway
## 11                                   network_CORUM
## 12                                  network_CORUMA
## 13                      network_Kinase_phosphosite
## 14                           network_Kinase_target
## 15                             network_PPI_BIOGRID
## 16                                network_PTMsigDB
## 17             network_Transcription_Factor_target
## 18                            network_miRNA_target
## 19          phenotype_Mammalian_Phenotype_Ontology
## 20             chromosomalLocation_CytogeneticBand
## 21         community-contributed_5htGeneSets_Conte
## 22 community-contributed_MuscleGeneSets_Duddy_2017
##                                                                                                                                                                                                                                                                                                                          description
## 1                                                                                                                                                                                                                                    The gene ontology biological process database was downloaded from http://www.geneontology.org/.
## 2                          The gene ontology biological process database was downloaded from http://www.geneontology.org/. Then, we only contain the non-redundant categories by selecting the most general categories in each branch of the GO DAG structure from all categories with the number of annotated genes from 20 to 500.
## 3                                                                                                                                                                                                                                    The gene ontology cellular component database was downloaded from http://www.geneontology.org/.
## 4                          The gene ontology cellular component database was downloaded from http://www.geneontology.org/. Then, we only contain the non-redundant categories by selecting the most general categories in each branch of the GO DAG structure from all categories with the number of annotated genes from 20 to 500.
## 5                                                                                                                                                                                                                                    The gene ontology molecular function database was downloaded from http://www.geneontology.org/.
## 6                          The gene ontology molecular function database was downloaded from http://www.geneontology.org/. Then, we only contain the non-redundant categories by selecting the most general categories in each branch of the GO DAG structure from all categories with the number of annotated genes from 20 to 500.
## 7                                                                                                                                                                                                                                                                 The KEGG pathway database was downloaded from http://www.kegg.jp/.
## 8                                                                                                                                                                                                                                                The PANTHER pathway database was downloaded from http://www.pantherdb.org/pathway/.
## 9                                                                                                                                                                                                                                                        The Reactome pathway database was downloaded from http://www.reactome.org/.
## 10                                                                                                                                                                                                                                                         The Wikipathway database was downloaded from http://www.wikipathway.org/.
## 11                                                                                                                                                                                                                       The core complex subunits were downloaded from CORUM database (https://mips.helmholtz-muenchen.de/corum/#).
## 12                                                                                                                                                                                                                                                                                                                                  
## 13 The relationship between Kinase and the corresponding protein phosposite position was from RegPhos (http://140.138.144.141/~RegPhos/). Then, we got the 15-mer sequence window around the phosphorylated site of the peptide (up and down 7 amino acid) from the Uniprot website (http://www.uniprot.org/) as the identifier tag.
## 14                                                                                                                                          The relationship between Kinase and its targets was from PhosphositePlus (http://www.phosphosite.org/) and the phosphorylation relationships from a in-house combined signaling network.
## 15                                                                      The protein-protein interaction (PPI) network was downloaded from BIOGRID (https://thebiogrid.org/). Then, we used the NetSAM R package (http://bioconductor.org/packages/release/bioc/html/NetSAM.html) to identify the hierarchical co-expression modules.
## 16                                                                                                                                                                                                                                                                                                                                  
## 17                                                                                                                                                                                        The relationship between transcription factor and its targets was downloaded from MsigDB (http://software.broadinstitute.org/gsea/msigdb).
## 18                                                                                                                                                                                                       The relationship between miRNA and its targets was downloaded from MsigDB (http://software.broadinstitute.org/gsea/msigdb).
## 19                                                                                                                                                                                                                                 The phenotype ontology information for mouse was downloaded from http://www.informatics.jax.org/.
## 20                                                                                                                                                                                                                                                                                                                                  
## 21                                                                                                                                                                                                                                                                                                                                  
## 22                                                                                                                                                                                                                                                                                                  Muscle Gene Sets (v3) Duddy 2017
##            idType
## 1      entrezgene
## 2      entrezgene
## 3      entrezgene
## 4      entrezgene
## 5      entrezgene
## 6      entrezgene
## 7      entrezgene
## 8      entrezgene
## 9      entrezgene
## 10     entrezgene
## 11     entrezgene
## 12     entrezgene
## 13 phosphositeSeq
## 14 phosphositeSeq
## 15     entrezgene
## 16 phosphositeSeq
## 17     entrezgene
## 18     entrezgene
## 19     entrezgene
## 20     entrezgene
## 21     entrezgene
## 22     entrezgene
selected_colors <- c("blue","green","red")

for (color in selected_colors) {
  current_genes <- geneInfo_annotated %>%
    filter(module_color == color) %>%
    dplyr::select(gene_symbol)

  write(current_genes$gene_symbol, file = paste0(enrichdir, color, "_genes.txt"))
}

enrich_result <- WebGestaltRBatch(
  enrichMethod = "ORA",
  organism = "mmusculus",
  enrichDatabase = c("geneontology_Biological_Process"),
  enrichDatabaseType = "genesymbol",
  interestGeneFolder = enrichdir,
  interestGeneType = "genesymbol",
  referenceSet = "genome",
  minNum = 10,
  maxNum = 500,
  sigMethod = "fdr",
  fdrMethod = "BH",
  fdrThr = 0.05,
  topThr = 10,
  reportNum = 20,
  perNum = 1000,
  projectName = "SBO ER stress",
  isOutput = TRUE,
  outputDirectory = enrichdir,
  dagColor = "continuous",
  setCoverNum = 10,
  networkConstructionMethod = NULL,
  neighborNum = 10,
  highlightType = "Seeds",
  highlightSeedNum = 10,
  nThreads = 32
)
## Process file: ../results//enrichment_analysis//blue_genes.txt
## Loading the functional categories...
## Loading the ID list...
## Loading the reference list...
## Summarizing the input ID list by GO Slim data...
## Performing the enrichment analysis...
## Begin affinity propagation...
## End affinity propagation...
## Begin weighted set cover...
## End weighted set cover...
## Generate the final report...
## Results can be found in the ../results//enrichment_analysis//Project_blue_genes!
## Process file: ../results//enrichment_analysis//green_genes.txt
## Loading the functional categories...
## Loading the ID list...
## Loading the reference list...
## Summarizing the input ID list by GO Slim data...
## Performing the enrichment analysis...
## Begin affinity propagation...
## End affinity propagation...
## Begin weighted set cover...
## End weighted set cover...
## Generate the final report...
## Results can be found in the ../results//enrichment_analysis//Project_green_genes!
## Process file: ../results//enrichment_analysis//red_genes.txt
## Loading the functional categories...
## Loading the ID list...
## Loading the reference list...
## Summarizing the input ID list by GO Slim data...
## Performing the enrichment analysis...
## Begin affinity propagation...
## End affinity propagation...
## Begin weighted set cover...
## Remain is 0, ending weighted set cover
## Generate the final report...
## Results can be found in the ../results//enrichment_analysis//Project_red_genes!
filename <- paste(results_dir, "Table S5.xlsx",
  sep = ""
)

write_xlsx(list(
   a=enrich_result[[1]]$enrichResult[1:8, c(1, 2, 4, 5, 7, 8, 9, 11)],
   b=enrich_result[[2]]$enrichResult[1:8, c(1, 2, 4, 5, 7, 8, 9, 11)],
   c=enrich_result[[3]]$enrichResult[1:8, c(1, 2, 4, 5, 7, 8, 9, 11)]
 ),
 path = filename,
 format_headers = TRUE
 )

plot the ME and enrichment for blue module genes

wgcna_boxplot_blue <- plot_df %>%
  mutate(new_sample_label=case_when(
    orig.ident == "TM-" ~ "TM-",
    orig.ident == "TM+ 1hr" ~ "TM+\n1hr",
    orig.ident == "TM+ 2hr" ~ "TM+\n2hr",
    orig.ident == "TM+ 4hr" ~ "TM+\n4hr",
    orig.ident == "TM+ 8hr" ~ "TM+\n8hr",
    )) %>%
  filter( variable == "MEblue") %>%
  ggplot(aes(x = new_sample_label, y = value, fill = orig.ident)) +
  scale_fill_manual(values = c("#fde725", "#5ec962", "#21918c", "#31688e", "#440154")) +
  geom_boxplot(alpha=0.5) + 
  labs(x="", y="Module Eigengene", title = "Blue Module") +
  RotatedAxis() +
  theme_bw() +
  theme(
    legend.position = "none"
  ) 

blue_module_enrichment_plot <- enrich_result[[1]]$enrichResult[1:8, c(1, 2, 4, 5, 7, 8, 9, 11)] %>%
  mutate(type = "MEblue") %>%
mutate(FDR = case_when(FDR == 0 ~ 2.2e-16, TRUE ~ FDR)) %>%
  mutate(new_description = str_wrap(description, width = 50)) %>%
  mutate(new_order = fct_reorder(new_description, FDR)) %>%
  ggplot(aes(y = -log10(FDR), x =new_order, fill=type, label=overlap)) +
  geom_bar(stat="identity", width = 0.75) +
     scale_y_continuous(expand = c(0, 0), limits = c(0, -log10(2.2e-16)+ 0.05)) +
  geom_text(aes(label = overlap), vjust = 0.5,hjust = 1.1, colour = "white") +
  #scale_fill_viridis() +
  theme_bw() +
  labs(x = "",y = "-log10(FDR)", title = "GO Biological Process") +
  scale_fill_manual(values = "#1F78B4") +
  theme(strip.text.x = element_text(face = "bold"),strip.background = element_blank(), legend.position = "none",
        axis.text.x = element_text( size = 6),
        axis.text.y = element_text( size = 9),
        panel.spacing = unit(2, "lines")) +
    scale_x_discrete(limits=rev) +
  coord_flip()

fig3_cd <- wgcna_boxplot_blue + blue_module_enrichment_plot + plot_layout(widths=c(1,2.5))
fig3_cd

wgcna_boxplot_green <- plot_df %>%
  mutate(new_sample_label=case_when(
    orig.ident == "TM-" ~ "TM-",
    orig.ident == "TM+ 1hr" ~ "TM+\n1hr",
    orig.ident == "TM+ 2hr" ~ "TM+\n2hr",
    orig.ident == "TM+ 4hr" ~ "TM+\n4hr",
    orig.ident == "TM+ 8hr" ~ "TM+\n8hr",
    )) %>%
  filter( variable == "MEgreen") %>%
  ggplot(aes(x = new_sample_label, y = value, fill = orig.ident)) +
  scale_fill_manual(values = c("#fde725", "#5ec962", "#21918c", "#31688e", "#440154")) +
  geom_boxplot(alpha=0.5) + 
  labs(x="", y="Module Eigengene", title = "Green Module") +
  RotatedAxis() +
  theme_bw() +
  theme(
    legend.position = "none"
  ) 

green_module_enrichment_plot <- enrich_result[[2]]$enrichResult[1:8, c(1, 2, 4, 5, 7, 8, 9, 11)] %>%
  mutate(type = "MEgreen") %>%
mutate(FDR = case_when(FDR == 0 ~ 2.2e-16, TRUE ~ FDR)) %>%
  mutate(new_description = str_wrap(description, width = 50)) %>%
  mutate(new_order = fct_reorder(new_description, FDR)) %>%
  ggplot(aes(y = -log10(FDR), x =new_order, fill=type, label=overlap)) +
  geom_bar(stat="identity", width = 0.75) +
     scale_y_continuous(expand = c(0, 0), limits = c(0, -log10(1.012793e-08)+0.05)) +
  geom_text(aes(label = overlap), vjust = 0.5,hjust = 1.1, colour = "white") +
  #scale_fill_viridis() +
  theme_bw() +
  labs(x = "",y = "-log10(FDR)", title = "GO Biological Process") +
  scale_fill_manual(values = "#33A02C") +
  theme(strip.text.x = element_text(face = "bold"),strip.background = element_blank(), legend.position = "none",
        axis.text.x = element_text( size = 6),
        panel.spacing = unit(2, "lines")) +
    scale_x_discrete(limits=rev) +
  coord_flip()

fig3_ef <- wgcna_boxplot_green + green_module_enrichment_plot + plot_layout(widths=c(1,2.5))
fig3_ef

wgcna_boxplot_red <- plot_df %>%
  mutate(new_sample_label=case_when(
    orig.ident == "TM-" ~ "TM-",
    orig.ident == "TM+ 1hr" ~ "TM+\n1hr",
    orig.ident == "TM+ 2hr" ~ "TM+\n2hr",
    orig.ident == "TM+ 4hr" ~ "TM+\n4hr",
    orig.ident == "TM+ 8hr" ~ "TM+\n8hr",
    )) %>%
  filter( variable == "MEred") %>%
  ggplot(aes(x = new_sample_label, y = value, fill = orig.ident)) +
  scale_fill_manual(values = c("#fde725", "#5ec962", "#21918c", "#31688e", "#440154")) +
  geom_boxplot(alpha=0.5) + 
  labs(x="", y="Module Eigengene", title = "Red Module") +
  RotatedAxis() +
  theme_bw() +
  theme(
    legend.position = "none"
  ) 

red_module_enrichment_plot <- enrich_result[[3]]$enrichResult[1:8, c(1, 2, 4, 5, 7, 8, 9, 11)] %>%
  mutate(type = "MEred") %>%
mutate(FDR = case_when(FDR == 0 ~ 2.2e-16, TRUE ~ FDR)) %>%
  mutate(new_description = str_wrap(description, width = 50)) %>%
  mutate(new_order = fct_reorder(new_description, FDR)) %>%
  ggplot(aes(y = -log10(FDR), x =new_order, fill=type, label=overlap)) +
  geom_bar(stat="identity", width = 0.75) +
     scale_y_continuous(expand = c(0, 0), limits = c(0, -log10(3.455569e-10)+0.05)) +
  geom_text(aes(label = overlap), vjust = 0.5,hjust = 1.1, colour = "white") +
  #scale_fill_viridis() +
  theme_bw() +
  labs(x = "",y = "-log10(FDR)", title = "GO Biological Process") +
  scale_fill_manual(values = "#E31A1C") +
  theme(strip.text.x = element_text(face = "bold"),strip.background = element_blank(), legend.position = "none",
        axis.text.x = element_text( size = 6),
        panel.spacing = unit(2, "lines")) +
    scale_x_discrete(limits=rev) +
  coord_flip()

fig3_gh <- wgcna_boxplot_red + red_module_enrichment_plot + plot_layout(widths=c(1,2.5))
fig3_gh

fig3_cd / fig3_ef / fig3_gh

ggsave(filename = paste(results_dir, "Figure 3CDEFGH.png", sep = ""), width = 10, height = 9.75, device = "png", dpi = 700)
selected_genes <-geneInfo_annotated %>%
 # filter(gene_symbol %in% cell_cycle_phase_transition) %>%
  arrange(-kMEblue) %>%
  top_n(9, kMEblue)
  
for (i in 1:dim(selected_genes)[1]) {

   umap_data <- FeaturePlot(classified_cells, features = selected_genes$ensembl_gene_id_version[i])$data %>%
  mutate(gene = selected_genes$gene_symbol[i]) %>%
   dplyr::rename(expression = selected_genes$ensembl_gene_id_version[i])
 
umap_plot <- umap_data %>%
  filter(gene == selected_genes$gene_symbol[i]) %>%
  ggplot(aes(x = UMAP_1, y = UMAP_2, color = expression)) +
  geom_point(size = 0.1) +
  scale_color_viridis(option = "turbo") +
  theme_bw() +
  #scale_x_reverse() +
  labs(x = "UMAP 2", y = "UMAP 1", color = paste0(selected_genes$gene_symbol[i],"\nExpression \nlevel"), 
       title= selected_genes$gene_symbol[i], 
       subtitle = paste0(" kMEblue = ", signif(selected_genes$kMEblue[i],digits = 3))) +
  theme(
    plot.title = element_text(face = "italic", size=12),
    plot.subtitle = element_text(size=10),
    legend.title = element_text(size=10)
    ) 

assign(paste0("fig_s13", LETTERS[i]), umap_plot)
}

(fig_s13A + fig_s13B + fig_s13C) / 
  (fig_s13D + fig_s13E + fig_s13F) /
    (fig_s13G + fig_s13H + fig_s13I) 

ggsave(filename = paste(results_dir, "Figure S13.png", sep = ""), width = 9, height = 7, device = "png", dpi = 700)
selected_genes <-geneInfo_annotated %>%
 # filter(gene_symbol %in% cell_cycle_phase_transition) %>%
  arrange(-kMEgreen) %>%
  top_n(9, kMEgreen)
  
for (i in 1:dim(selected_genes)[1]) {

   umap_data <- FeaturePlot(classified_cells, features = selected_genes$ensembl_gene_id_version[i])$data %>%
  mutate(gene = selected_genes$gene_symbol[i]) %>%
   dplyr::rename(expression = selected_genes$ensembl_gene_id_version[i])
 
umap_plot <- umap_data %>%
  filter(gene == selected_genes$gene_symbol[i]) %>%
  ggplot(aes(x = UMAP_1, y = UMAP_2, color = expression)) +
  geom_point(size = 0.1) +
  scale_color_viridis(option = "turbo") +
  theme_bw() +
  #scale_x_reverse() +
  labs(x = "UMAP 2", y = "UMAP 1", color = paste0(selected_genes$gene_symbol[i],"\nExpression \nlevel"), 
       title= selected_genes$gene_symbol[i], 
       subtitle = paste0(" kMEgreen = ", signif(selected_genes$kMEgreen[i],digits = 3))) +
  theme(
    plot.title = element_text(face = "italic", size=12),
    plot.subtitle = element_text(size=10),
    legend.title = element_text(size=10)
    ) 

assign(paste0("fig_s14", LETTERS[i]), umap_plot)
}

(fig_s14A + fig_s14B + fig_s14C) / 
  (fig_s14D + fig_s14E + fig_s14F) /
    (fig_s14G + fig_s14H + fig_s14I) 

ggsave(filename = paste(results_dir, "Figure S14.png", sep = ""), width = 9, height = 7, device = "png", dpi = 700)
selected_genes <-geneInfo_annotated %>%
  #filter(gene_symbol %in% c("Hmgcs1","Hmgcr","Mvd","Pmvk","Fdft1","Msmo1", "Erg28", "Slc27a3", "Gamt")) %>%
  filter(!is.na(gene_symbol)) %>%
  arrange(-kMEred) %>%
  top_n(9, kMEred)
  
for (i in 1:dim(selected_genes)[1]) {

   umap_data <- FeaturePlot(classified_cells, features = selected_genes$ensembl_gene_id_version[i])$data %>%
  mutate(gene = selected_genes$gene_symbol[i]) %>%
   dplyr::rename(expression = selected_genes$ensembl_gene_id_version[i])
 
umap_plot <- umap_data %>%
  filter(gene == selected_genes$gene_symbol[i]) %>%
  ggplot(aes(x = UMAP_1, y = UMAP_2, color = expression)) +
  geom_point(size = 0.1) +
  scale_color_viridis(option = "turbo") +
  theme_bw() +
  #scale_x_reverse() +
  labs(x = "UMAP 2", y = "UMAP 1", color = paste0(selected_genes$gene_symbol[i],"\nExpression \nlevel"), 
       title= selected_genes$gene_symbol[i], 
       subtitle = paste0(" kMEred = ", signif(selected_genes$kMEred[i],digits = 3))) +
  theme(
    plot.title = element_text(face = "italic", size=12),
    plot.subtitle = element_text(size=10),
    legend.title = element_text(size=10)
    ) 

assign(paste0("fig_s15", LETTERS[i]), umap_plot)
}

(fig_s15A + fig_s15B + fig_s15C) / 
  (fig_s15D + fig_s15E + fig_s15F) /
    (fig_s15G + fig_s15H + fig_s15I) 

ggsave(filename = paste(results_dir, "Figure S15.png", sep = ""), width = 9, height = 7, device = "png", dpi = 700)
geneInfo_annotated %>%
  filter(module_color == "red") %>%
  arrange(gene_symbol)
##    ensembl_gene_id_version   gene_biotype  gene_symbol
## 1     ENSCGRG00015004639.1 protein_coding        Abcd1
## 2     ENSCGRG00015013937.1 protein_coding     AI413582
## 3     ENSCGRG00015025505.1 protein_coding         Arf5
## 4     ENSCGRG00015016610.1 protein_coding       Arrdc3
## 5     ENSCGRG00015019097.1 protein_coding      Bhlhe40
## 6     ENSCGRG00015004628.1 protein_coding         Bst2
## 7     ENSCGRG00015003306.1 protein_coding        Btbd2
## 8     ENSCGRG00015019831.1 protein_coding        Calm1
## 9     ENSCGRG00015001315.1 protein_coding       Card19
## 10    ENSCGRG00015008170.1 protein_coding       Ccdc34
## 11    ENSCGRG00015012961.1 protein_coding         Ccl2
## 12    ENSCGRG00015003389.1 protein_coding      Csnk1g2
## 13    ENSCGRG00015017818.1 protein_coding          Ddt
## 14    ENSCGRG00015007995.1 protein_coding         Dmpk
## 15    ENSCGRG00015006794.1 protein_coding          Ebp
## 16    ENSCGRG00015001510.1 protein_coding        Erg28
## 17    ENSCGRG00015025789.1 protein_coding         Etv5
## 18    ENSCGRG00015010409.1 protein_coding        Fdft1
## 19    ENSCGRG00015006058.1 protein_coding         Fhl3
## 20    ENSCGRG00015004983.1 protein_coding         Gamt
## 21    ENSCGRG00015026397.1 protein_coding        Gfpt2
## 22    ENSCGRG00015028575.1 protein_coding       Ggnbp1
## 23    ENSCGRG00015011405.1 protein_coding      Hcfc1r1
## 24    ENSCGRG00015009730.1 protein_coding        Hmgcr
## 25    ENSCGRG00015020108.1 protein_coding       Hmgcs1
## 26    ENSCGRG00015014924.1 protein_coding         Idi1
## 27    ENSCGRG00015027573.1 protein_coding       Ifitm3
## 28    ENSCGRG00015012498.1 protein_coding        Inka1
## 29    ENSCGRG00015027292.1 protein_coding        Klf10
## 30    ENSCGRG00015022379.1 protein_coding         Layn
## 31    ENSCGRG00015002040.1 protein_coding       Lgals1
## 32    ENSCGRG00015024702.1 protein_coding        Lmod1
## 33    ENSCGRG00015002474.1 protein_coding LOC100754451
## 34    ENSCGRG00015002164.1 protein_coding        Lpin1
## 35    ENSCGRG00015023987.1 protein_coding         Mmab
## 36    ENSCGRG00015004181.1 protein_coding        Msmo1
## 37    ENSCGRG00015021408.1 protein_coding          Mvd
## 38    ENSCGRG00015015900.1 protein_coding        Nr1h3
## 39    ENSCGRG00015004526.1 protein_coding        Ostf1
## 40    ENSCGRG00015017998.1 protein_coding        Plac8
## 41    ENSCGRG00015004004.1 protein_coding         Pmvk
## 42    ENSCGRG00015028336.1 protein_coding        Psmb9
## 43    ENSCGRG00015000477.1 protein_coding       Rex1bd
## 44    ENSCGRG00015007350.1 protein_coding       S100a5
## 45    ENSCGRG00015011584.1 protein_coding        Sardh
## 46    ENSCGRG00015016144.1 protein_coding      Sh3glb2
## 47    ENSCGRG00015006534.1 protein_coding      Slc27a3
## 48    ENSCGRG00015011632.1 protein_coding      Tinagl1
## 49    ENSCGRG00015024224.1 protein_coding        Tnni1
## 50    ENSCGRG00015024489.1 protein_coding        Txnip
## 51    ENSCGRG00015000695.1 protein_coding        Uqcrb
## 52    ENSCGRG00015009254.1 protein_coding         <NA>
## 53    ENSCGRG00015019905.1         lncRNA         <NA>
## 54    ENSCGRG00015019906.1         lncRNA         <NA>
## 55    ENSCGRG00015001288.1 protein_coding         <NA>
## 56    ENSCGRG00015010289.1         lncRNA         <NA>
## 57    ENSCGRG00015020144.1         lncRNA         <NA>
## 58    ENSCGRG00015027709.1 protein_coding         <NA>
## 59    ENSCGRG00015028397.1         lncRNA         <NA>
## 60    ENSCGRG00015023652.1         lncRNA         <NA>
##                                                                                                  description
## 1                                 ATP binding cassette subfamily D member 1 [Source:NCBI gene;Acc:100769988]
## 2                                            expressed sequence AI413582 [Source:MGI Symbol;Acc:MGI:2146839]
## 3                                                 ADP ribosylation factor 5 [Source:NCBI gene;Acc:100759676]
## 4                                              arrestin domain containing 3 [Source:NCBI gene;Acc:100755882]
## 5                                  basic helix-loop-helix family member e40 [Source:NCBI gene;Acc:100765325]
## 6  Cricetulus griseus bone marrow stromal cell antigen 2 (Bst2), mRNA. [Source:RefSeq mRNA;Acc:NM_001244115]
## 7                                                   BTB domain containing 2 [Source:NCBI gene;Acc:100759139]
## 8                                                              calmodulin 2 [Source:NCBI gene;Acc:100765281]
## 9                               caspase recruitment domain family member 19 [Source:NCBI gene;Acc:100758861]
## 10                                         coiled-coil domain containing 34 [Source:NCBI gene;Acc:100752060]
## 11                                                    C-C motif chemokine 2 [Source:NCBI gene;Acc:100763833]
## 12                                                  casein kinase 1 gamma 2 [Source:NCBI gene;Acc:100758847]
## 13                                               D-dopachrome decarboxylase [Source:NCBI gene;Acc:100765163]
## 14                                                  myotonin-protein kinase [Source:NCBI gene;Acc:113837295]
## 15                                          EBP cholestenol delta-isomerase [Source:NCBI gene;Acc:100765639]
## 16                                       ergosterol biosynthesis 28 homolog [Source:NCBI gene;Acc:100756249]
## 17                                       ETS variant transcription factor 5 [Source:NCBI gene;Acc:100756646]
## 18                               farnesyl-diphosphate farnesyltransferase 1 [Source:NCBI gene;Acc:100689192]
## 19                                            four and a half LIM domains 3 [Source:NCBI gene;Acc:100751268]
## 20                                    guanidinoacetate methyltransferase [Source:MGI Symbol;Acc:MGI:1098221]
## 21                            glutamine-fructose-6-phosphate transaminase 2 [Source:NCBI gene;Acc:100767191]
## 22                                          gametogenetin-binding protein 1 [Source:NCBI gene;Acc:100773455]
## 23                                          host cell factor C1 regulator 1 [Source:NCBI gene;Acc:100768272]
## 24                                 3-hydroxy-3-methylglutaryl-CoA reductase [Source:NCBI gene;Acc:100756363]
## 25                                3-hydroxy-3-methylglutaryl-CoA synthase 1 [Source:NCBI gene;Acc:100762235]
## 26                                isopentenyl-diphosphate delta isomerase 1 [Source:NCBI gene;Acc:100752366]
## 27                               interferon-induced transmembrane protein 3 [Source:NCBI gene;Acc:100757825]
## 28                                               inka box actin regulator 1 [Source:NCBI gene;Acc:100759700]
## 29                                                   Kruppel like factor 10 [Source:NCBI gene;Acc:100775085]
## 30                            Cricetulus griseus layilin (Layn), mRNA. [Source:RefSeq mRNA;Acc:NM_001246681]
## 31                                                               galectin 1 [Source:NCBI gene;Acc:100751963]
## 32                                                              leiomodin 1 [Source:NCBI gene;Acc:100763180]
## 33                                   fatty acid-binding protein, intestinal [Source:NCBI gene;Acc:100754451]
## 34                                                                  lipin 1 [Source:NCBI gene;Acc:100689029]
## 35                                     metabolism of cobalamin associated B [Source:NCBI gene;Acc:100751940]
## 36                                             methylsterol monooxygenase 1 [Source:NCBI gene;Acc:100754784]
## 37                                     mevalonate diphosphate decarboxylase [Source:NCBI gene;Acc:100766541]
## 38                            nuclear receptor subfamily 1 group H member 3 [Source:NCBI gene;Acc:100756988]
## 39                                          osteoclast stimulating factor 1 [Source:NCBI gene;Acc:100751605]
## 40                                                    placenta associated 8 [Source:NCBI gene;Acc:100754739]
## 41                                                 phosphomevalonate kinase [Source:NCBI gene;Acc:100759992]
## 42                                            proteasome 20S subunit beta 9 [Source:NCBI gene;Acc:100767386]
## 43                              required for excision 1-B domain containing [Source:NCBI gene;Acc:100757102]
## 44                                          S100 calcium binding protein A5 [Source:NCBI gene;Acc:100771097]
## 45                                                  sarcosine dehydrogenase [Source:NCBI gene;Acc:100761121]
## 46                                    SH3-domain GRB2-like endophilin B2 [Source:MGI Symbol;Acc:MGI:2385131]
## 47                                        solute carrier family 27 member 3 [Source:NCBI gene;Acc:100752129]
## 48                              tubulointerstitial nephritis antigen like 1 [Source:NCBI gene;Acc:100760266]
## 49                                          troponin I1, slow skeletal type [Source:NCBI gene;Acc:100756622]
## 50                                          thioredoxin interacting protein [Source:NCBI gene;Acc:100756038]
## 51                                        cytochrome b-c1 complex subunit 7 [Source:NCBI gene;Acc:100762374]
## 52                                                                                                      <NA>
## 53                                                                                                      <NA>
## 54                                                                                                      <NA>
## 55                                                                                                      <NA>
## 56                                                                                                      <NA>
## 57                                                                                                      <NA>
## 58                                                                                                      <NA>
## 59                                                                                                      <NA>
## 60                                                                                                      <NA>
##    module_color   kMEgrey    kMEred      kMEblue kMEturquoise    kMEyellow
## 1           red 0.2931748 0.6566967 -0.323193983  0.469138026  0.068734362
## 2           red 0.4911832 0.8820447 -0.302376552  0.404907298 -0.016741657
## 3           red 0.5006515 0.8316578 -0.321435489  0.145778178 -0.196435260
## 4           red 0.3300320 0.5180787  0.011350453  0.158591903 -0.012376627
## 5           red 0.5168649 0.5267533  0.172328437  0.304489718  0.246334153
## 6           red 0.2784712 0.6307271 -0.372069061 -0.027127299 -0.367862843
## 7           red 0.3665416 0.5403451 -0.052868784  0.228256257  0.036887269
## 8           red 0.6466055 0.8478705 -0.177159417 -0.003481388 -0.254548068
## 9           red 0.5216074 0.7572386 -0.249748091  0.505582022  0.207529037
## 10          red 0.4736144 0.4391859  0.051549778 -0.176496744 -0.185066659
## 11          red 0.4594409 0.8086342 -0.309324346  0.288288150 -0.121152137
## 12          red 0.5921068 0.7055682 -0.003198828  0.355347106  0.171664967
## 13          red 0.3924149 0.5946321 -0.126697258 -0.210870015 -0.394084526
## 14          red 0.3739294 0.6008675 -0.203638718  0.292735556  0.034131393
## 15          red 0.6228300 0.7354876 -0.014303431  0.010120686 -0.122560024
## 16          red 0.5258876 0.7718207 -0.148419042  0.361080121  0.095549858
## 17          red 0.3434900 0.6765373 -0.188676243  0.280161246 -0.036109938
## 18          red 0.5295493 0.8858692 -0.215065151  0.470097178  0.083229145
## 19          red 0.6649646 0.7752537  0.090509875  0.300648107  0.153640315
## 20          red 0.4066330 0.5733301 -0.106936594  0.098270911 -0.107023910
## 21          red 0.4349714 0.6134333 -0.121924943  0.420028152  0.170005259
## 22          red 0.3822148 0.6704158 -0.321990046  0.168946397 -0.186256738
## 23          red 0.5587742 0.6880478 -0.177759228  0.017057050 -0.214946372
## 24          red 0.4222297 0.7332311 -0.105415452  0.559788476  0.239637248
## 25          red 0.3124116 0.7929437 -0.350206496  0.488862319  0.044208142
## 26          red 0.4978003 0.6937751  0.037300552  0.508072772  0.326980648
## 27          red 0.1701821 0.4508234 -0.440307051 -0.237775050 -0.557649887
## 28          red 0.4816983 0.5648344 -0.042983627  0.366288864  0.164172598
## 29          red 0.4806052 0.5364292  0.189992830  0.128247762  0.089577455
## 30          red 0.5396566 0.8163822 -0.205285431  0.576156666  0.214164323
## 31          red 0.4793802 0.8324250 -0.413428834  0.075607514 -0.333527105
## 32          red 0.2703982 0.4746861 -0.145779411  0.250712225  0.043417034
## 33          red 0.4076906 0.7181378 -0.248680427  0.188313899 -0.162686024
## 34          red 0.5187586 0.8017157 -0.114324458  0.238169621 -0.055668950
## 35          red 0.4758000 0.7172471 -0.046308548  0.433648610  0.173144720
## 36          red 0.3170594 0.7694082 -0.363362254  0.183321667 -0.219015288
## 37          red 0.6465510 0.7851926  0.090239546  0.140003394  0.024727796
## 38          red 0.4801598 0.6182440 -0.014700181  0.203701661  0.047355151
## 39          red 0.4385370 0.7023593 -0.313042782  0.159997872 -0.150854054
## 40          red 0.3190116 0.6510780 -0.329627481  0.081673601 -0.261673313
## 41          red 0.5973575 0.7839251 -0.068609301  0.117452814 -0.078419974
## 42          red 0.4079704 0.5098870 -0.126422508 -0.043522527 -0.206659990
## 43          red 0.6262589 0.8837893 -0.217976787  0.275748064 -0.044359649
## 44          red 0.4167550 0.7445623 -0.397184710  0.215164541 -0.201174209
## 45          red 0.4074147 0.4779580 -0.007983570  0.349992341  0.197821822
## 46          red 0.4634826 0.7236710 -0.217603926  0.325434805  0.059135331
## 47          red 0.3553106 0.6747635 -0.282217287  0.230584305 -0.123380470
## 48          red 0.6581539 0.8619974 -0.120362747  0.245252163  0.001850951
## 49          red 0.3803600 0.4533724 -0.020519671  0.330209371  0.138076276
## 50          red 0.4854913 0.5994091  0.177661554  0.344615998  0.241449572
## 51          red 0.5235132 0.5255564  0.015636077 -0.033668478 -0.078139923
## 52          red 0.6212107 0.6198008  0.046852522  0.151716456  0.096431447
## 53          red 0.5023459 0.7709448 -0.256895068  0.251976750 -0.088466037
## 54          red 0.5306434 0.7504645 -0.189581717  0.200426390 -0.083612395
## 55          red 0.6713103 0.8581420 -0.119036259  0.210647972 -0.033903052
## 56          red 0.3911793 0.5711140 -0.241460166  0.094358939 -0.160773150
## 57          red 0.5157654 0.8234715 -0.187289379  0.420477569  0.104715549
## 58          red 0.3640558 0.5167059 -0.186257286 -0.032949927 -0.242838574
## 59          red 0.4203670 0.7098278 -0.269962738  0.413350918  0.055284100
## 60          red 0.4595986 0.5855959 -0.076022464  0.299518963  0.111966347
##        kMEorange     kMEgreen     kMEblack    kMEpurple     kMEbrown
## 1  -0.0435219513  0.449768889  0.049404744 -0.359284816 -0.133523679
## 2   0.2362442031  0.504657957  0.106176354 -0.368091796 -0.184849923
## 3   0.1599149880  0.442436620  0.093219935 -0.170940379 -0.044349852
## 4   0.1139481957  0.049558120  0.137116207  0.008835176  0.082383067
## 5  -0.0009766817 -0.083039469 -0.186118624 -0.145367803 -0.275755920
## 6   0.0531348188  0.405072398  0.224394990  0.090876258  0.188964877
## 7   0.2249085228  0.176667122  0.017729570 -0.129193123 -0.216741444
## 8   0.1157298498  0.240308078  0.048543496  0.020709431 -0.043095530
## 9  -0.0154026017  0.489406270 -0.233777328 -0.679520539 -0.461668771
## 10  0.1419798084 -0.002036466 -0.064870178  0.029985907  0.069115452
## 11  0.1086168401  0.425152132  0.088760879 -0.195984946 -0.147900851
## 12  0.2230251957  0.192211385 -0.159735034 -0.367261705 -0.403810833
## 13  0.0672379684  0.075868441  0.231120748  0.369259439  0.166602051
## 14  0.0093203422  0.328469416 -0.054160463 -0.320911891 -0.243504216
## 15  0.1244584368  0.060028700 -0.102156280  0.095650046 -0.121256942
## 16  0.2734520092  0.363146199 -0.119502821 -0.401399435 -0.299827249
## 17  0.1854334129  0.333826682  0.186863936 -0.185757388 -0.076954838
## 18  0.0730004669  0.373284080 -0.008430698 -0.293405533 -0.254195255
## 19  0.0034479429  0.073138112 -0.166622768 -0.253277384 -0.284942926
## 20  0.1371560409  0.210853754 -0.034666040 -0.097404983 -0.114183938
## 21  0.0520197603  0.288571282 -0.141131964 -0.463281485 -0.331758721
## 22 -0.0037262441  0.422249793  0.141008877 -0.233807416 -0.028831896
## 23  0.0278956566  0.233498092  0.053459283 -0.055453792  0.003549393
## 24  0.0864554028  0.261677241 -0.074158803 -0.266346693 -0.269696440
## 25  0.1616875433  0.489282303  0.117775854 -0.230546224 -0.189384759
## 26  0.1025979253  0.114939249 -0.210040477 -0.262278287 -0.333112738
## 27  0.0225880328  0.380583716  0.324969836  0.252975964  0.303379119
## 28  0.1467112337  0.230911361 -0.160808537 -0.437260016 -0.298758745
## 29  0.1621292360 -0.111359436 -0.037483610  0.054394480 -0.045378260
## 30  0.0671837755  0.436567138 -0.127447533 -0.594364882 -0.407926847
## 31 -0.0024612086  0.454541166  0.232397922 -0.037570243 -0.005538437
## 32 -0.1262308538  0.200398834 -0.047214190 -0.172083237 -0.180227556
## 33  0.0813340230  0.333869975  0.160754361 -0.099545063 -0.090250289
## 34  0.1358875218  0.229632144  0.053806200 -0.056688988 -0.081201576
## 35  0.2545221574  0.204355279 -0.026927086 -0.214495722 -0.134554399
## 36  0.1662267409  0.415661381  0.277853464  0.058193664 -0.009287451
## 37  0.1703533046  0.011195470 -0.105132349  0.031451879 -0.170739402
## 38  0.0769784762  0.097746187 -0.031513200 -0.099422730 -0.228969347
## 39 -0.0071240161  0.400060829  0.040449592 -0.251316422 -0.158281047
## 40  0.0281349174  0.324431283  0.275743444  0.052365487 -0.021415961
## 41  0.1511266223  0.167255662 -0.058491637 -0.021926769 -0.115771713
## 42  0.0912823475  0.148323220  0.164199914  0.058819036 -0.007567427
## 43  0.0257246327  0.367953341 -0.055722445 -0.307817619 -0.257998592
## 44 -0.0918566195  0.448007492  0.230961383 -0.168563168 -0.173804724
## 45 -0.0134694866  0.186835638 -0.264683660 -0.461617131 -0.248286500
## 46  0.0300189085  0.429009744 -0.170786623 -0.494129895 -0.314913501
## 47  0.1016939766  0.385379711  0.073298686 -0.140227742 -0.283145891
## 48  0.1103592517  0.245056061 -0.094384975 -0.188033887 -0.257974584
## 49  0.0054317730  0.098739438 -0.096637719 -0.221083489 -0.174617483
## 50  0.2055778462 -0.063874758 -0.080219643 -0.068361926 -0.156056356
## 51  0.0582170378  0.049884630 -0.052472108 -0.094035472 -0.223254097
## 52  0.0055071763  0.093699027 -0.230792697 -0.335865806 -0.358507107
## 53  0.1084901953  0.363944165  0.163337565 -0.205924861 -0.168800626
## 54  0.1485066353  0.290556905  0.151826001 -0.170176069 -0.158613861
## 55  0.0282180201  0.243557557 -0.089554780 -0.193516932 -0.196023272
## 56 -0.1266555676  0.348720489  0.075632025 -0.266217787 -0.017894998
## 57  0.0269544414  0.357112788 -0.057865668 -0.353765812 -0.312050630
## 58 -0.1414213059  0.205549574  0.205557087  0.018224373  0.018912331
## 59  0.0099653136  0.436634460 -0.027322923 -0.419069203 -0.240321442
## 60  0.0303930236  0.178291406 -0.034114374 -0.248402777 -0.269934707
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_IE.UTF-8/en_IE.UTF-8/en_IE.UTF-8/C/en_IE.UTF-8/en_IE.UTF-8
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] plyr_1.8.6                  RColorBrewer_1.1-2         
##  [3] biomaRt_2.50.3              langevitour_0.0.1          
##  [5] SeuratWrappers_0.3.0        ggExtra_0.9                
##  [7] scater_1.22.0               scuttle_1.4.0              
##  [9] patchwork_1.1.1             ggpubr_0.4.0               
## [11] WebGestaltR_0.4.4           viridis_0.6.2              
## [13] viridisLite_0.4.0           SeuratObject_4.0.3         
## [15] Seurat_4.0.5                DT_0.21                    
## [17] scWGCNA_0.0.0.9000          WGCNA_1.70-3               
## [19] fastcluster_1.2.3           dynamicTreeCut_1.63-1      
## [21] ggridges_0.5.3              scales_1.1.1               
## [23] writexl_1.4.0               readxl_1.3.1               
## [25] forcats_0.5.1               stringr_1.4.0              
## [27] dplyr_1.0.7                 purrr_0.3.4                
## [29] readr_2.1.0                 tidyr_1.1.4                
## [31] tibble_3.1.6                ggplot2_3.3.5              
## [33] tidyverse_1.3.1             Matrix_1.3-4               
## [35] DropletUtils_1.14.1         SingleCellExperiment_1.16.0
## [37] SummarizedExperiment_1.24.0 Biobase_2.54.0             
## [39] GenomicRanges_1.46.0        GenomeInfoDb_1.30.0        
## [41] IRanges_2.28.0              S4Vectors_0.32.2           
## [43] BiocGenerics_0.40.0         MatrixGenerics_1.6.0       
## [45] matrixStats_0.61.0         
## 
## loaded via a namespace (and not attached):
##   [1] rappdirs_0.3.3            scattermore_0.7          
##   [3] R.methodsS3_1.8.1         bit64_4.0.5              
##   [5] knitr_1.36                irlba_2.3.3              
##   [7] DelayedArray_0.20.0       R.utils_2.11.0           
##   [9] data.table_1.14.2         rpart_4.1-15             
##  [11] KEGGREST_1.34.0           RCurl_1.98-1.5           
##  [13] doParallel_1.0.16         generics_0.1.1           
##  [15] ScaledMatrix_1.2.0        preprocessCore_1.56.0    
##  [17] cowplot_1.1.1             RSQLite_2.2.8            
##  [19] RANN_2.6.1                future_1.23.0            
##  [21] bit_4.0.4                 tzdb_0.2.0               
##  [23] spatstat.data_2.1-0       xml2_1.3.2               
##  [25] lubridate_1.8.0           httpuv_1.6.3             
##  [27] assertthat_0.2.1          xfun_0.30                
##  [29] hms_1.1.1                 jquerylib_0.1.4          
##  [31] evaluate_0.14             promises_1.2.0.1         
##  [33] progress_1.2.2            fansi_0.5.0              
##  [35] dbplyr_2.1.1              igraph_1.2.8             
##  [37] DBI_1.1.1                 htmlwidgets_1.5.4        
##  [39] apcluster_1.4.8           spatstat.geom_2.3-0      
##  [41] ellipsis_0.3.2            RSpectra_0.16-0          
##  [43] crosstalk_1.2.0           backports_1.3.0          
##  [45] bookdown_0.25             deldir_1.0-6             
##  [47] sparseMatrixStats_1.6.0   vctrs_0.3.8              
##  [49] remotes_2.4.1             ROCR_1.0-11              
##  [51] abind_1.4-5               cachem_1.0.6             
##  [53] withr_2.4.2               vroom_1.5.6              
##  [55] checkmate_2.0.0           sctransform_0.3.2        
##  [57] prettyunits_1.1.1         goftest_1.2-3            
##  [59] svglite_2.0.0             cluster_2.1.2            
##  [61] lazyeval_0.2.2            crayon_1.4.2             
##  [63] labeling_0.4.2            edgeR_3.36.0             
##  [65] pkgconfig_2.0.3           vipor_0.4.5              
##  [67] nlme_3.1-153              nnet_7.3-16              
##  [69] rlang_0.4.12              globals_0.14.0           
##  [71] lifecycle_1.0.1           miniUI_0.1.1.1           
##  [73] filelock_1.0.2            BiocFileCache_2.2.0      
##  [75] rsvd_1.0.5                modelr_0.1.8             
##  [77] cellranger_1.1.0          polyclip_1.10-0          
##  [79] lmtest_0.9-39             rngtools_1.5.2           
##  [81] carData_3.0-4             Rhdf5lib_1.16.0          
##  [83] zoo_1.8-9                 beeswarm_0.4.0           
##  [85] reprex_2.0.1              base64enc_0.1-3          
##  [87] whisker_0.4               png_0.1-7                
##  [89] bitops_1.0-7              R.oo_1.24.0              
##  [91] KernSmooth_2.23-20        rhdf5filters_1.6.0       
##  [93] Biostrings_2.62.0         blob_1.2.2               
##  [95] DelayedMatrixStats_1.16.0 doRNG_1.8.2              
##  [97] parallelly_1.28.1         rstatix_0.7.0            
##  [99] jpeg_0.1-9                ggsignif_0.6.3           
## [101] beachmat_2.10.0           memoise_2.0.0            
## [103] magrittr_2.0.1            ica_1.0-2                
## [105] zlibbioc_1.40.0           compiler_4.1.2           
## [107] dqrng_0.3.0               fitdistrplus_1.1-6       
## [109] cli_3.1.0                 XVector_0.34.0           
## [111] listenv_0.8.0             pbapply_1.5-0            
## [113] htmlTable_2.3.0           Formula_1.2-4            
## [115] MASS_7.3-54               mgcv_1.8-38              
## [117] tidyselect_1.1.1          stringi_1.7.5            
## [119] highr_0.9                 yaml_2.2.1               
## [121] BiocSingular_1.10.0       locfit_1.5-9.4           
## [123] latticeExtra_0.6-29       ggrepel_0.9.1            
## [125] grid_4.1.2                sass_0.4.0               
## [127] tools_4.1.2               future.apply_1.8.1       
## [129] parallel_4.1.2            rstudioapi_0.13          
## [131] foreach_1.5.1             foreign_0.8-81           
## [133] gridExtra_2.3             farver_2.1.0             
## [135] rmdformats_1.0.3          Rtsne_0.15               
## [137] BiocManager_1.30.16       digest_0.6.28            
## [139] FNN_1.1.3                 shiny_1.7.1              
## [141] Rcpp_1.0.7                car_3.0-12               
## [143] broom_0.7.10              later_1.3.0              
## [145] RcppAnnoy_0.0.19          httr_1.4.2               
## [147] AnnotationDbi_1.56.2      colorspace_2.0-2         
## [149] XML_3.99-0.8              rvest_1.0.2              
## [151] fs_1.5.0                  tensor_1.5               
## [153] reticulate_1.22           splines_4.1.2            
## [155] uwot_0.1.10               spatstat.utils_2.2-0     
## [157] flexmix_2.3-17            plotly_4.10.0            
## [159] systemfonts_1.0.2         xtable_1.8-4             
## [161] jsonlite_1.7.2            modeltools_0.2-23        
## [163] R6_2.5.1                  Hmisc_4.6-0              
## [165] pillar_1.6.4              htmltools_0.5.2          
## [167] mime_0.12                 glue_1.5.0               
## [169] fastmap_1.1.0             BiocParallel_1.28.0      
## [171] BiocNeighbors_1.12.0      codetools_0.2-18         
## [173] utf8_1.2.2                lattice_0.20-45          
## [175] bslib_0.3.1               spatstat.sparse_2.0-0    
## [177] curl_4.3.2                ggbeeswarm_0.6.0         
## [179] leiden_0.3.9              GO.db_3.14.0             
## [181] survival_3.2-13           limma_3.50.0             
## [183] rmarkdown_2.13            munsell_0.5.0            
## [185] rhdf5_2.38.0              GenomeInfoDbData_1.2.7   
## [187] iterators_1.0.13          HDF5Array_1.22.1         
## [189] impute_1.68.0             haven_2.4.3              
## [191] reshape2_1.4.4            gtable_0.3.0             
## [193] spatstat.core_2.3-1